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1 Introduction 

This paper, on solving the dynamics of recurrent neural networks using non-equilibrium statistical 
mechanical techniques, is the sequel of which was devoted to solving the statics using equilibrium 
techniques. I refer to ||l| for a general introduction to recurrent neural networks and their properties. 

Equilibrium statistical mechanical techniques can provide much detailed quantitative information 
on the behaviour of recurrent neural networks, but they obviously have serious restrictions. The first 
one is that, by definition, they will only provide information on network properties in the stationary 
state. For associative memories, for instance, it is not clear how one can calculate quantities like sizes 
of domains of attraction without solving the dynamics. The second, and more serious, restriction is 
that for equilibrium statistical mechanics to apply the dynamics of the network under study must 
obey detailed balance, i.e. absence of microscopic probability currents in the stationary state. As 
we have seen in [Q], for recurrent networks in which the dynamics take the form of a stochastic 
alignment of neuronal firing rates to post-synaptic potentials which, in turn, depend linearly on the 
firing rates, this requirement of detailed balance usually implies symmetry of the synaptic matrix. 
From a physiological point of view this requirement is clearly unacceptable, since it is violated in any 
network that obeys Dale's law as soon as an excitatory neuron is connected to an inhibitory one. Worse 
still, we saw in Q that in any network of graded-response neurons detailed balance will always be 
violated, even when the synapses are symmetric. The situation will become even worse when we turn 
to networks of yet more realistic (spike-based) neurons, such as integrate-and-fire ones. In contrast to 
this, non- equilibrium statistical mechanical techniques, it will turn out, do not impose such biologically 
non-realistic restrictions on neuron types and synaptic symmetry, and they are consequently the more 
appropriate avenue for future theoretical research aimed at solving biologically more realistic models. 

The common strategy of all non-equilibrium statistical mechanical studies is to derive and solve 
dynamical laws for a suitable small set of relevant macroscopic quantities from the dynamical laws of 
the underlying microscopic neuronal system. In order to make progress, as in equilibrium studies, one 
is initially forced to pay the price of having relatively simple model neurons, and of not having a very 
complicated spatial wiring structure in the network under study; the networks described and analysed 
in this paper will consequently be either fully connected, or randomly diluted. When attempting to 
obtain exact dynamical solutions within this class, one then soon finds a clear separation of network 
models into two distinct complexity classes, reflecting in the dynamics a separation which we also 
found in the statics. In statics one could get away with relatively simple mathematical techniques as 
long as the number of attractors of the dynamics was small compared to the number N of neurons. 
As soon as the number of attractors became of the order of A^, on the other hand, one entered the 
complex regime, requiring the more complicated formalism of replica theory. In dynamics we will again 
find that we can get away with relatively simple mathematical techniques as long as the number of 
attractors remains small, and find closed deterministic differential equations for macroscopic quantities 
with just a single time argument. As soon as we enter the complex regime, however, we will no longer 
find closed equations for one-time macroscopic objects: we will now have to work with correlation and 
response functions, which have two time arguments, and turn to the less trivial generating functional 
technique^. 

In contrast to the situation in statics Q, I cannot in this paper give many references to textbooks 
on the dynamics, since these are more or less non-existent. There would appear to be two reasons 
for this. Firstly, in most physics departments non-equilibrium statistical mechanics (as a subject) 

brief note about terminology: strictly speaking, in this paper we will apply these techniques only to models in 
which time is measured in discrete units, so that we should speak about generating functions rather than generating 
functionals. However, since these techniques can and have also been applied intensively to models with continuous time, 
they are in literature often referred to as generating functional techniques, for both discrete and continuous time. 
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is generally taught and applied far less intensively than equilibrium statistical mechanics, and thus 
the non-equilibrium studies of recurrent neural networks have been considerably less in number and 
later in appearance in literature than their equilibrium counterparts. Secondly, many of the popular 
textbooks on the statistical mechanics of neural networks were written around 1989, roughly at the 
point in time where non-equilibrium statistical mechanical studies just started being taken up. When 
reading such textbooks one could be forgiven for thinking that solving the dynamics of recurrent 
neural networks is generally ruled out, whereas, in fact, nothing could be further from the truth. Thus 
the references in this paper will, out of necessity, be mainly to research papers. I regret that, given 
constraints on page numbers and given my aim to explain ideas and techniques in a lecture notes style 
(rather than display encyclopedic skills), I will inevitably have left out relevant references. Another 
consequence of the scarce and scattered nature of the literature on the non-equilibrium statistical 
mechanics of recurrent neural networks is that a situation has developed where many mathematical 
procedures, properties and solutions are more or less known by the research community, but without 
there being a clear reference in literature where these were first formally derived (if at all). Examples 
of this are the fluctuation-dissipation theorems for parallel dynamics and the non-equilibrium analysis 
of networks with graded response neurons; often the separating boundary between accepted general 
knowledge and published accepted general knowledge is somewhat fuzzy. 

The structure of this paper mirrors more or less the structure of Q . Again I will start with relatively 
simple networks, with a small number of attractors (such as systems with uniform synapses, or with a 
small number of patterns stored with Hebbian-type rules) , which can be solved with relatively simple 
mathematical techniques. These will now also include networks that do not evolve to a stationary state, 
and networks of graded response neurons, which could not be studied within equilibrium statistical 
mechanics at all. Next follows a detour on correlation- and response functions and their relations 
(i.e. fluctuation-dissipation theorems), which serves as a prerequisite for the last section on generating 
functional methods, which are indeed formulated in the language of correlation- and response functions. 
In this last, more mathematically involved, section I study symmetric and non-symmetric attractor 
neural networks close to saturation, i.e. in the complex regime. I will show how to solve the dynamics 
of fully connected as well as extremely diluted networks, emphasising the (again) crucial issue of 
presence (or absence) of synaptic symmetry, and compare the predictions of the (exact) generating 
functional formalism to both numerical simulations and simple approximate theories. 

2 Attractor Neural Networks with Binary Neurons 

The simplest non-trivial recurrent neural networks consist of N binary neurons ai G {—1, 1} (see |]l|]) 
which respond stochastically to post-synaptic potentials (or local fields) hi{a), with a = {ai, . . . , a]\f). 
The fields depend linearly on the instantaneous neuron states, hi{a) = J2j Jij^j + with the Jij 
representing synaptic efficacies, and the 6i representing external stimuli and/or neural thresholds. 

2.1 Closed Macroscopic Laws for Sequential Dynamics 

First I show how for sequential dynamics (where neurons are updated one after the other) one can 
calculate, from the microscopic stochastic laws, differential equations for the probability distribution 
of suitably defined macroscopic observables. For mathematical convenience our starting point will be 
the continuous-time master equation for the microscopic probability distribution pticr) 

= {'WiiFiCr)pt{Fia) - Wi{a)pt{cr)} Wi{cr) = ^[l-fTj tanh[/3/ii(cr)]] (1) 

i 
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with Fj$((T) = $(cri, . . . ,(Tj_i, — cjj,cjj+i, . . . jCTat) (see I will discuss the conditions for the evo- 
lution of these macroscopic state variables to become deterministic in the limit of infinitely large 
networks and, in addition, be governed by a closed set of equations. I then turn to specific models, 
with and without detailed balance, and show how the macroscopic equations can be used to illuminate 
and understand the dynamics of attractor neural networks away from saturation. 

A Toy Model. Let me illustrate the basic ideas with the help of a simple (infinite range) toy model: 
Jij = {J/N)'ni^o 6i = (the variables rji and are arbitrary, but may not depend on A^). For 
rji = = 1 we get a network with uniform synapses. For rji = £ {—1, 1} and J > we recover 
the Hopfield Q model with one stored pattern. Note: the synaptic matrix is non-symmetric as soon 
as a pair (ij) exists such that rji^^j ^ so in general equilibrium statistical mechanics will not 
apply. The local fields become hi{cr) = J'qim{cT) with m{cr) = jj^k^k^k- Since they depend on the 
microscopic state cr only through the value of m, the latter quantity appears to constitute a natural 
macroscopic level of description. The probability density of finding the macroscopic state m{a) = m 
is given by Vt\m] = Pt{^)^[^~'^{^y\- Its time derivative follows upon inserting (||): 

-^Vt[m] = Y^^pt{cr)wk{cr) |(5[m-m(cr)-F— ^fcO-fc]-<J [m-m(cr)]| 

d f 2 ^ 1 1 

= ^ <^J2Pt(^)^ [m-7n(cr)] — ^ ^k(7kWk{(T)j + 0( — ) 

Inserting our expressions for the transition rates Wi{cr) and the local fields /li(cr) gives: 



dt''^^^^ = d^V^^'^^ 



N 



m - ^ tanh[?7fc/3Jm] 
k=i 



In the limit — > oo only the first term survives. The general solution of the resulting Liouville 
equation is Vtlm] = JdniQ Vo[mQ]5 [m—m{t\mQ)], where m(t|mo) is the solution of 

d 1 ^ 

— m = hm — ^ S,k tanh[7/fc/3 Jm] — m m(0) = uiq (2) 

k=l 

This describes deterministic evolution; the only uncertainty in the value of m is due to uncertainty 
in initial conditions. If at t = the quantity m is known exactly, this will remain the case for finite 
time-scales; m turns out to evolve in time according to (|2|). 

Arbitrary Synapses. Let us now allow for less trivial choices of the synaptic matrix {Jij} and try to 
calculate the evolution in time of a given set of macroscopic observables fl{a) = (f^i (cr), . . . , r2„(cr)) in 
the limit N ^ oo. There are no restrictions yet on the form or the number n of these state variables; 
these will, however, arise naturally if we require the observables CI to obey a closed set of deterministic 
laws, as we will see. The probability density of finding the system in macroscopic state ft is given by: 

rtm = Y.Pt{cT)6[n-il{a)] (3) 
cr 

Its time derivative is obtained by inserting (|l|). If in those parts of the resulting expression which 
contain the operators Fi we perform the transformations cr Ficr, we arrive at 

jVt [ft] = ^ ^pt(,T)u;,(^) {6 [n-^t{Fi(T)] - 6 [ft-ft{(T)]} 
i cr 
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Upon writing Q^(Fia) = r2^(cr) + Aj^(cr) and making a Taylor expansion in powers of {Aj^(<T)}, we 
finally obtain the so-called Kramers-Moyal expansion: 

i^* ["] = E ^ E • • • E m fS...,. m]} (4) 

It involves conditional averages {f{<^))fi.f and the 'discrete derivatives' Ajf^{a) = Q.^{Fja) — Q.^{cr) 0: 

Retaining only the £ = 1 term in (Q) would lead us to a Liouville equation, which describes deterministic 
flow in fi space. Including also the £ = 2 term leads us to a Fokker-Planck equation which, in addition 
to flow, describes diffusion of the macroscopic probability density. Thus a sufficient condition for the 
observables Cl(cr) to evolve in time deterministically in the limit ^ cxd is: 

-in n N 

li^ E7[E---EE(l^.m('^)---^A«.('^)l)n;t = (6) 

In the simple case where all observables $7^ scale similarly in the sense that all 'derivatives' A^^ = 
^^{Ficr) — i}f^{cr) are of the same order in (i.e. there is a monotonic function Ajy such that 
Aj^ = 0{A.n) for all jfJ'), for instance, criterion (^) becomes: 



lim nAivViV = (7) 

If for a given set of observables condition @ is satisfied, we can for large describe the evolution of 
the macroscopic probability density by a Liouville equation: 

T ''^ ft 

whose solution describes deterministic flow: Vt[fl] = J dftoVo['^o]^['^ — ^{'t\^o)] with ri(t|fio) given, 
in turn, as the solution of 

jQit) = F^'^Q{t);t] n(0) = no (8) 

In taking the limit — > oo, however, we have to keep in mind that the resulting deterministic theory 
is obtained by taking this limit for finite t. According to (§) the £ > 1 terms do come into play for 
sufficiently large times t; for A^ oo, however, these times diverge by virtue of (|6|). 

The Issue of Closure. Equation will in general not be autonomous; tracing back the origin of the 
explicit time dependence in the right-hand side of (|8|) one finds that to calculate F^^^ one needs to 
know the microscopic probability density pt{o')- This, in turn, requires solving equation (|l|) (which is 
exactly what one tries to avoid) . We will now discuss a mechanism via which to eliminate the offending 
explicit time dependence, and to turn the observables ft{cr) into an autonomous level of description, 

^ Expansion is to be interpreted in a distributional sense, i.e. only to be used in expressions of the form 
J dflVt{il)G{il) with smooth functions G{fl), so that all derivatives are well-defined and finite. Furthermore, 
will only be useful if the Aj,,, which measure the sensitivity of the macroscopic quantities to single neuron state changes, 
are sufficiently small. This is to be expected: for finite A'' any observable can only assume a finite number of possible 
values; only for N oo may we expect smooth probability distributions for our macroscopic quantities. 
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governed by closed dynamic laws. The idea is to choose the observables ft(cr) in such a way that there 
is no exphcit time dependence in the flow field F^^^ ['^',t] (if possible). According to (^) this implies 
making sure that there exist functions [ft] such that 

N 

hm wM)A^^.{a) = $^ (9) 

in which case the time dependence of F^^"^ indeed drops out and the macroscopic state vector simply 
evolves in time according to: 

Clearly, for this closure method to apply, a suitable separable structure of the synaptic matrix is 
required. If, for instance, the macroscopic observables fi^ depend linearly on the microscopic state 
variables tr (i.e. fl^{a) = J^jLi'^fj.j'^j)^ we obtain with the transition rates defined in (|l|): 

d 1 ^ 

Jt^^' = jji^ 77 ^ ^^'^ tanh(/?/ij (cr)) - 0^ (10) 

j=i 

in which case the only further condition for (^) to hold is that all local fields /ifc(cr) must (in leading 
order in N) depend on the microscopic state a only through the values of the observables fi; since 
the local fields depend linearly on a this, in turn, implies that the synaptic matrix must be separable: 
if Jij = Yl,^Ki^uj^j then indeed /ij(cr) = Yl,^Kifj_Q,fj_{a) + 9i. Next I will show how this approach can 
be applied to networks for which the matrix of synapses has a separable form (which includes most 
symmetric and non-symmetric Hebbian type attractor models). I will restrict myself to models with 
9i = 0; introducing non-zero thresholds is straightforward and does not pose new problems. 



2.2 Application to Separable Attractor Networks 

Separable models: Description at the Level of Sublattice Activities. We consider the following class of 
models, in which the interaction matrices have the form 

J^j = ^Q{^i■,^j) ^. = (e,^...,ef) (11) 

The components .^f, representing the information ('patterns') to be stored or processed, are assumed 
to be drawn from a finite discrete set A, containing n\ elements (they are not allowed to depend on 
A^). The Hopfield model corresponds to choosing Q(x;y) = x ■ y and A = {—1,1}. One now 
introduces a partition of the system {1, . . . , A^} into so-called sublattices J/y: 

^ = {^1^. = ^} {l,...,Ar} = |J/^ tjgAP (12) 

The number of neurons in sublattice I-q is denoted by (this number will have to be large). If we 
choose as our macroscopic observables the average activities ('magnetisations') within these sublattices, 
we are able to express the local fields solely in terms of macroscopic quantities: 

mrjicr) = ^ ai, hk{cr) = ^prjQ {^k, v) ^rj (13) 
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with the relative sublattice sizes prj = {Irjl/N. If all prj are of the same order in (which, for 
example, is the case if the vectors have been drawn at random from the set A^) we may write 
Ajrj = 0{n^N~'^) and use The evolution in time of the sublattice activities is then found to be 
deterministic in the N ^ cc limit if limjv^oo p/ log iV = 0. Furthermore, condition holds, since 



N 



V 



V 



We may conclude that the situation is that described by (|T^), and that the evolution in time of the 
sublattice activities is governed by the following autonomous set of differential equations 0]: 



di 



mrj = tanh[/?^pr/'<5 {v^v') ru'q' 



rri' 



V 



(14) 



We see that, in contrast to the equilibrium techniques as described in Q, here there is no need at all 
to require symmetry of the interaction matrix or absence of self- interactions. In the symmetric case 
Q{x\y) = Q{y;x) the system will approach equilibrium; if the kernel Q is positive definite this can 
be shown, for instance, by inspection of the Lyapunov function^ >C{mr^}: 

1 1 
£{mrj} = i^Yl P'qm'qQi'n', v')'mr]'Pr]' - ^^Pv ^^g cosh[/3 ^ Q{rf] v')mr]'Pr]'] 



which is bounded from below and obeys: 



V 



d 

di' 



-C 



E 



Q{v;v') 



< 



(15) 



Note that from the sublattice activities, in turn, follow the 'overlaps' m^{<j) (see [|l| 

N 



(16) 



1=1 



Simple examples of relevant models of the type (11), the dynamics of which are for large N described 
by equation (|l4|), are for instance the ones where one applies a non- linear operation $ to the standard 
Hopfield-type [|[ (or Hebbian-type) interactions . This non-linearity could result from e.g. a clipping 
procedure or from retaining only the sign of the Hebbian values: 



e.g. 



-K for 
X for 
K for 



x<K 
-K <x <K 
x> K 



or $(x) = sgn(3;) 



The effect of introducing such non-linearities is found to be of a quantitative nature, giving rise to 
little more than a re-scaling of critical noise levels and storage capacities. I will not go into full details, 
these can be found in e.g. Q], but illustrate this statement by working out the p = 2 equations for 
randomly drawn pattern bits G {—1, 1}, where there are only four sub-lattices, and where prj = ^ 
for all T]. Using ^{0) = and ^{—x) = —^(x) (as with the above examples) we obtain from ([l^): 



^mrj = tanh[^/3^>(2)(mT^ - uL-r])] 



m 



V 



(17) 



■^A function of the state variables which is bounded from below and whose value decreases monotonically during the 
dynamics, see e.g. j^. Its existence guarantees evolution towards a stationary state (under some weak conditions). 
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Here the choice made for <I>(x) shows up only as a re-scahng of the temperature. From ([l7| ) we further 
obtain ^{rrirj + m-r]) = —{fn-q + fn-rj)- The system decays exponentially towards a state where, 
according to (0), m-q = —ni-r] ^or all rj. If at t = this is already the case, we find (at least for 
p = 2) decoupled equations for the sub-lattice activities. 

Separable Models: Description at the Level of Overlaps. Equations ( |T^jl^ ) suggest that at the level of 
overlaps there will be, in turn, closed laws if the kernel Q is bi-linear:^, Q(x;y) = ^^yX^A^y-yy, or: 

•^. = ^E^fvej i^ = {^l...,0 (18) 

We will see that now the need not be drawn from a finite discrete set (as long as they do not depend 
on A'"). The Hopfield model corresponds to A^u = 5^u and € {—1, 1}. The fields can now be 
written in terms of the overlaps m^: 

1 ^ 

hk{(T) Am{cT) m = {mi,...,mp) rnf,{a) = —^.^i'^i (19) 

i=l 

For this choice of macroscopic variables we find Aj^ = 0{N~^), so the evolution of the vector m 
becomes deterministic for ^ oo if, according to (0), Huin^ooP/VN = 0. Again (^) holds, since 

N ^ N 

J2 Wj{(T)Aj^{a) = ^J2^k tanh [(3^^ • Am] - m 
j=i k=i 

Thus the evolution in time of the overlap vector m is governed by a closed set of differential equations: 

j^m = {^tanhm- Am])^-m {m)^ = J p^i^ (20) 

with p{^) = liniN^oo ]^~^J2i^[i~^i]- Symmetry of the synapses is not required. For certain non- 
symmetric matrices A one finds stable limit-cycle solutions of (|20|). In the symmetric case ^d^jy = A,yfj_ 
the system will approach equilibrium; the Lyapunov function ([l^) for positive definite matrices A now 
becomes: ^ ^ 

£{m} = -m • Am — — (log cosh • Am])^ 

Figure Q shows in the mi, m2-plane the result of solving the macroscopic laws ( pO|) numerically for 
p = 2, randomly drawn pattern bits € { — 1, 1}, and two choices of the matrix A. The first choice 
(upper row) corresponds to the Hopfield model; as the noise level T = increases the amplitudes of 
the four attractors (corresponding to the two patterns and their mirror images — ^'^) continuously 
decrease, until at the critical noise level Tc = 1 (see also [||) they merge into the trivial attractor 
m = (0,0). The second choice corresponds to a non-symmetric model (i.e. without detailed balance); 
at the macroscopic level of description (at finite time-scales) the system clearly does not approach 
equilibrium; macroscopic order now manifests itself in the form of a limit-cycle (provided the noise 
level T is below the critical value Tc = 1 where this limit-cycle is destroyed). To what extent the 
laws (^) are in agreement with the result of performing the actual simulations in finite systems is 
illustrated in figure ^. Other examples can be found in |^, . 

^Strictly speaking, it is already sufficient to have a kernel which is linear in y only, i.e. Q{x\y) = fv{x)yi, 
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T=0.6 
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Figure 1: Flow diagrams obtained by numerically solving equations (20) for p = 2. Upper row: 



Af^u = S/ii/ (the Hopfield model); lower row: ^ = ( ) (here the critical noise level is Tc = 1). 



N=1000 



N=3000 



Theory 




Figure 2: Comparison between simulation results for finite systems {N = 1000 and = 3000) and 
the N = oo analytical prediction ([20|), for p = 2, T = 0.8 and A = { ^-^^^^). 



As a second simple application of the flow equations ( pOD we turn to the relaxation times corre- 
sponding to the attractors of the Hopfield model (where Afj_i, = 6^^). Expanding ( pO[ ) near a stable 
fixed-point m*, i.e. m{t) = m* + x{t) with \x[t)\ <C 1, gives the linearised equation 



d 



-^Xf, = [/3^(e/.^,.tanh[/3| • m*])£ - 5^^]x^ + 0{x^ 



(21) 



The Jacobian of (^), which determines the linearised equation (|2lD, turns out to be minus the 
curvature matrix of the free energy surface at the fixed-point (c.f. the derivations in [|l|). The 
asymptotic relaxation towards any stable attractor is generally exponential, with a characteristic time 
r given by the inverse of the smallest eigenvalue of the curvature matrix. If, in particular, for the 
fixed point m* we substitute an n-mixture state, i.e. = m„ (^u < n) and = (/i > n), and 
transform (]2l| ) to the basis where the corresponding curvature matrix D^") (with eigenvalues D^^) is 
diagonal, x ^ x, we obtain 
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Figure 3: Asymptotic relaxation times r„ of the mixture states of the Hopfield model as a function of 
the noise level r = From bottom to top: n = 1, 3, 5, 7, 9, 11, 13. 



so r~ = miuA-D^, which we have already calculated (see |ll|) in determining the character of the 
saddle-points of the free-energy surface. The result is shown in figure |^. The relaxation time for the 
n-mixture attractors decreases monotonically with the degree of mixing n, for any noise level. At 
the transition where a macroscopic state m* ceases to correspond to a local minimum of the free 
energy surface, it also de-stabilises in terms of the linearised dynamic equation (|2l] ) (as it should). 
The Jacobian develops a zero eigenvalue, the relaxation time diverges, and the long-time behaviour is 
no longer obtained from the linearised equation. This gives rise to critical slowing down (power law 
relaxation as opposed to exponential relaxation). For instance, at the transition temperature Tc = 1 
for the n = 1 (pure) state, we find by expanding (pT 



1 

t~2. 



which gives rise to a relaxation towards the trivial fixed-point of the form m 

If one is willing to restrict oneself to the limited class of models (|l8|) (as opposed to the more 



general class (11)) and to the more global level of description in terms of p overlap parameters 
instead of sublattice activities mrj, then there are two rewards. Firstly there will be no restrictions 
on the stored pattern components (for instance, they are allowed to be real- valued) ; secondly the 
number p of patterns stored can be much larger for the deterministic autonomous dynamical laws to 
hold {p <^ \/iV instead of p <C log N, which from a biological point of view is not impressive. 



2.3 Closed Macroscopic Laws for Parallel Dynamics 

We now turn to the parallel dynamics counterpart of (|l|), i.e. the Markov chain 



N 



Pi+i{a) =J2W[a; a'] pt{a') W [a- a']=\{-[l + a, tanh[/?/i,((7')]] 



(22) 



i=\ 
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(with (Tj G {—1, 1}, and with local fields hi[a) defined in the usual way). The evolution of macroscopic 
probability densities wih here be described by discrete mappings, in stead of differential equations. 

The Toy Model. Let us first see what happens to our previous toy model: Jij = {J/N)rii^j and 
9i = 0. As before we try to describe the dynamics at the (macroscopic) level of the quantity m{a) = 
jj ik^^k- The evolution of the macroscopic probability density Vt[m\ is obtained by inserting ([2^): 



Vt+i[m] = 5[m-m{a)] W [cr;cr'] pt{a') = j dm' Wt [m,m'] Vt[m'] (23) 

with 

- _ Egg' S [m-m{a)] 6 [m'-mja')] W [a; a'] ptja') 
*^ Ea'Hm'-m{a')]pt{cT') 

We now insert our expression for the transition probabilities T^[<t;<t'] and for the local fields. Since 
the fields depend on the microscopic state cr only through m(cr), the distribution pt{<y) drops out of 
the above expression for Wt which thereby loses its explicit time-dependence, Wt \m, m'] — > W [m, m']: 

W[m,m'] =e-S.l°g™«M/3J'n''?.)(5[^_^(cr)]e^^'"'E,'?--«)^ with (. . .)cr = 2^^ ^ . . . 

cr 

Inserting the integral representation for the (5-function allows us to perform the average: 



W \m, m'] 



dk e^*('"'™'''=) 



27r 

\I' = ij3km+ (log cosh /3[Jr/m' — ifc^])^_^ — (log cosh /?[Jr/m'])^ 

Since W[m,m'] is (by construction) normalised, Jdm W[m,m'] = 1, we find that for N ^ oo the 
expectation value with respect to W [m, m'] of any sufficiently smooth function f{m) will be determined 
only by the value m*{ml) of m in the relevant saddle-point of ^': 

dm f{m)W [m,m'] = ^ jZdkT^i-^rn.k, ^ ^ 

Variation of ^ with respect to k and m gives the two saddle-point equations: 

m = (^tanh/3[Jr/m' — ^/c])^^^. A; = 

We may now conclude that Huin^ooW [m,m'] = 6 [m — m* {m')] with m*{m') = (^ tanh(/3Jr/m'))^^^, 
and that the macroscopic equation ( |2^ ) becomes: 

Vt+i[m] = Jdm' 5 [m- (^ tanh(/?Jr/m')),,5] Vt[m'] {N oo) 

This describes deterministic evolution. If at t = we know m exactly, this will remain the case for 
finite time-scales, and m will evolve according to a discrete version of the sequential dynamics law (Q): 

mt+i = (?tanh[/?J?7mt])^^g (24) 



Arbitrary Synapses. We now try to generalise the above approach to less trivial classes of models. As 
for the sequential case we will find in the limit N ^ oo closed deterministic evolution equations for a 
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more general set of intensive macroscopic state variables fl{a) = {^}i{cr), . . . ,ri„(cr) if the local fields 
hi(a) depend on the microscopic state a only through the values of ft{a), and if the number n of 
these state variables necessary to do so is not too large. The evolution of the ensemble probability 
density (H) is now obtained by inserting the Markov equation (p2|): 



Wt [n, n'] 



Vt+i [n] = J dn' Wt [n, n'] Vt [n'] 



(25) 



{5 [fl-fl{a)] ^eSi[/3'^i^«('''')-logcosh{/3fe,(cr'))]< 



(26) 



with {. ■ ■)a = J2(T • • ^iid with the conditional (or sub-shell) average defined as in (^). It is clear 
from ( p6| ) that in order to find autonomous macroscopic laws, i.e. for the distribution pt{cr) to drop 
out, the local fields must depend on the microscopic state cr only through the macroscopic quantities 
fl{a): hi{cr) = hi['n,{a)]. In this case Wt loses its explicit time-dependence, Wt [ft, ft'] W[fl,fl']. 
Inserting integral representations for the 5-functions leads to: 



w [ft, n'] 



(3N 



27r 



JdK e^*("."''^) 



* = • n + 1 log(e4E. ^^nUNK-ma)] ■ 



N 



^logcosh[/3/ii[n'] 



Using the normalisation J dfl W [ft, ft'] = 1, we can write expectation values with respect to W [ft, ft'] 
of macroscopic quantities f[ft] as 



dft f[ft]W [ft, ft'] 



JdfldK f[ft]e^^(^^^''K) 
JdftdK e^^i^'^''^) 



(27) 



For saddle-point arguments to apply in determining the leading order in N of (^), we encounter 
restrictions on the number n of our macroscopic quantities (as expected), since n determines the 
dimension of the integrations in (p7|). The restrictions can be found by expanding ^' around its 
maximum After defining x = {ft,K), of dimension 2n, and after translating the location of the 
maximum to the origin, one has 

^(x) = ^* - ^ XI ^tiXyH^i, + X Xf^XuXpL^up + 0{x^) 



giving 



ld^g{x)^^_ Jdx [g{x) - g^Q)^^-\Nx.Hx+NY.,^^.,..^,L,.,+OiNx^) 

fdx g{x)e^'^(x) ^-^nx-Hx+nY,^^^x^x,x,l^,,+o{nx*) 

_ jdy [g(jy/VjV) -g(0)]e-^^-^^+^M.P^'-^-^''^^-^/^+^(^'/^) 
jdy ^'iy-Hy+Y.^.py^y-^ypL^..p/VN+o{yVN) 



Jdy 


'N--2yVg{Q) + 0{y^/N) 


e-\yHy 


1 + Y.PUP y^yuypL^^^p/y/N + 0{y^/N) 


Jdy e-'^y^y 


1 + y^^yuypLpup/VN + Oiy^N) 
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= 0{n^/N) + 0{n^/N'^) + non-dominant terms {N, n — > oo) 
with H denoting the Hessian (curvature) matrix of the surface ^ at the minimum ^* . We thus find 

Hm n/\/]V = : hm /df2 f[n]W [0,0'] = / 

where ri*(ri') denotes the value of fl in the saddle-point where ^ is minimised. Variation of ^ with 
respect to fl and K gives the saddle-point equations: 

We may now conclude that hmjv^oo W [Tt, ft'] = 6 [ft — ft* {ft')], with 



and that for ^ oo the macroscopic equation p5| ) becomes P(+i[fi] = /dfi' (5[r2 — ri*(r2')]'P([ri']. 
This relation again describes deterministic evolution. If at t = we know CI exactly, this will remain 
the case for finite time-scales and fi will evolve according to 

As with the sequential case, in taking the limit — > oo we have to keep in mind that the resulting 
laws apply to finite t, and that for sufficiently large times terms of higher order in do come into play. 
As for the sequential case, a more rigorous and tedious analysis shows that the restriction nj^fN 
can in fact be weakened to n/N — > 0. Finally, for macroscopic quantities Cl{a) which are linear in cr, 
the remaining cr-averages become trivial, so that [^: 

= 4 E'^M^^i ■ + 1) = 1™ T^y^^t^^^^^^ [Phi[n{t)]] (29) 



(to be compared with (10), as derived for sequential dynamics). 



2.4 Application to Separable Attractor Networks 

Separable models: Sublattice Activities and Overlaps. The separable attractor models (pT[), described 
at the level of sublattice activities (IT^), indeed have the property that all local fields can be written 
in terms of the macroscopic observables. What remains to ensure deterministic evolution is meeting 
the condition on the number of sublattices. If all relative sublattice sizes prj are of the same order in 
N (as for randomly drawn patterns) this condition again translates into limAr_>oop/log = (as for 
sequential dynamics). Since the sublattice activities are linear functions of the crj, their evolution in 
time is governed by equation (|29|), which acquires the form: 



mrjit + 1) = tanh[f3j2Pv'Q iv;v') mrj'it)] (30) 



As for sequential dynamics, symmetry of the interaction matrix does not play a role. 
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Figure 4: Evolution of overlaps m^(<T) 



obtained by numerical iteration of the macroscopic parallel 



dynamics laws (plD , for the synapses Ji- 



V 

N 



ere; + ¥ i:^ er 'e;, with p = lo and r = 0.5. 



At the more global level of overlaps m^{a) = N ^ Yl,i Ci^i we, in turn, obtain autonomous deter- 
ministic laws if the local fields /ij(cr) can be expressed in terms if m{a) only, as for the models (|T8|) 



(or, more generally, for all models in which the interactions are of the form Jij = J2fi<p fifiCj): 



with the following restriction on the number p of embedded patterns: liniN^ooP/ V N = (as with 
sequential dynamics). For the bi- linear models (|l8[), the evolution in time of the overlap vector m 
(which depends linearly on the ai) is governed by (p9|), which now translates into the iterative map: 

m{t + 1) = tanh[/3^ • Am{t)])^ (31) 

with p{^) as defined in (pO|). Again symmetry of the synapses is not required. For parallel dynamics it 
is far more difficult than for sequential dynamics to construct Lyapunov functions, and prove that the 
macroscopic laws (|3l|) for symmetric systems evolve towards a stable fixed-point (as one would expect), 
but it can still be done. For non-symmetric systems the macroscopic laws (^Tj) can in principle display 
all the interesting, but complicated, phenomena of non-conservative non-linear systems. Nevertheless, 
it is also not uncommon that the equations ( |3l| ) for non-symmetric systems can be mapped by a 
time-dependent transformation onto the equations for related symmetric systems (mostly variants of 
the original Hopfield model). 

As an example we show in figure |^ as functions of time the values of the overlaps for p = 10 

and T = 0.5, resulting from numerical iteration of the macroscopic laws (^) for the model 



N 



Eer e; 



(fj, : mod p) 



i.e. Axp = vSxp + (1 — z^)(5a,p+i (A,p : mod p), with randomly drawn pattern bits G 1} The 
initial state is chosen to be the pure state = (5^^i. At intervals of At = 20 iterations the parameter 
f is reduced in Az^ = 0.25 steps from z/ = 1 (where one recovers the symmetric Hopfield model) to 
V = (where one obtains a non-symmetric model which processes the p embedded patterns in strict 



sequential order as a period-p limit-cycle). The analysis of the equations (31) for the pure sequence 
processing case = is greatly simplified by mapping the model onto the ordinary {v = 1) Hopfield 
model, using the index permutation symmetries of the present pattern distribution, as follows (all 
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pattern indices are periodic, mod p). Define m^[t) = M^_f(t), now 

M^{t+l) = (C^+i+itanh[/35]Cp+iMp_t(t)])^ = (C;.tanh[/3^ • M{t)\)^ 

p 

We can now immediately infer, in particular, that to each stable macroscopic fixed-point attractor 
of the original Hopfield model corresponds a stable period-p macroscopic limit-cycle attractor in the 
V = 1 sequence processing model (e.g. pure states pure sequences, mixture states mixture 
sequences), with identical amplitude as a function of the noise level. Figure |^ shows for = (i.e. 
f > 80) a relaxation towards such a pure sequence. 

Finally we note that the fixed-points of the macroscopic equations (^4|) and ( pO| ) (derived for 
sequential dynamics) are identical to those of ( |30|) and ( |3l|) (derived for parallel dynamics). The 
stability properties of these fixed points, however, need not be the same, and have to be assessed on 
a case-by-case basis. For the Hopfield model, i.e. equations ( poj^ ) with A^y = 5^i,, they are found 
to be the same, but already for A^y = —5fj,i, the two types of dynamics would behave differently. 



3 Attractor Neural Networks with Continuous Neurons 



3.1 Closed Macroscopic Laws 

General Derivation. We have seen in Q) that models of recurrent neural networks with continuous 
neural variables (e.g. graded response neurons or coupled oscillators) can often be described by a 
Fokker-Planck equation for the microscopic state probability density pticr): 



dai 



da 



(32) 



Averages over Pi(cr) are denoted by {G) = Jda pt{a)G{a, t). From (^) one obtains directly (through 
integration by parts) an equation for the time derivative of averages: 



dt 



dG, 



(33) 



In particular, if we apply (|^) to G{a,t) = 5[f2 — ^i(cr)], for any set of macroscopic observables 
ri(cr) = [Vti{(T), . . . ,ri„(<T)) (in the spirit of the previous section), we obtain a dynamic equation for 
the macroscopic probability density Pt{^) = ((5[ri — f2(<T)]), which is again of the Fokker-Planck form: 



d 
dt 



d 



i 



fi{cT)+T 



d 



dQ.ndQ.v 



fit/ M 

with the conditional (or sub-shell) averages: 



d(Ji 

Ptm (E 



_d_ 

da, 



dai 



Jda pt{(T)5[n - ft{cT)]G{cT) 



(34) 



(35) 



Jda pt{a)6[ft - n{a)] 

From ( ^4| ) we infer that a sufficient condition for the observables Cl{a) to evolve in time determinis 
tically (i.e. for having vanishing diffusion matrix elements in (p^) in the limit ^ oo is 

2 







(36) 
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If (|36D holds, the macroscopic Fokker-Planck equation (|34|) reduces for — > oo to a Liouville equation, 
and the observables fl{a) wih evolve in time according to the coupled deterministic equations: 



dt ^ 



lim 



fi{cT)+T 



d_ 

dai 



d_ 

da. 



(37) 



The deterministic macroscopic equation (37), together with its associated condition for validity (|3 
will form the basis for the subsequent analysis. 

Closure: A Toy Model Again. The general derivation given above went smoothly. However, the 
equations (|37|) are not yet closed. It turns out that to achieve closure even for simple continuous 
networks we can no longer get away with just a finite (small) number of macroscopic observables (as 
with binary neurons). This I will now illustrate with a simple toy network of graded response neurons: 



d_ 

di 



(38) 



with g[z\ = i[tanh(72;)+l] and with the standard Gaussian white noise r?j(t) (see ffl). In the language 



of dH) this means fi{u) = J^j Jijdl 



Ui 



We choose uniform synapses Jij = J/N, so fi{u) 



(J/N) J2j dl^j] ~ Ui- If (pq) were to hold, we would find the deterministic macroscopic laws 



dt ^ 



« 3 



Ui + T 



duj du, 



(39) 



In contrast to similar models with binary neurons, choosing as our macroscopic level of description 
il{u) again simply the average m{u) = N~^ J2i ^'^^ leads to an equation which fails to close: 



d 
di 



1 



m 



m:t 



m 



The term A^^'^ J2j ob^j] cannot be written as a function of N^'^ J2i '^i- We might be tempted to try 
dealing with this problem by just including the offending term in our macroscopic set, and choose 
Cl{u) = {N~^ J^i'^iT J2i dW'i])- This would indeed solve our closure problem for the m-equation, 
but we would now find a new closure problem in the equation for the newly introduced observable. 
The only way out is to choose an observable function, namely the distribution of potentials 



p{u;u) 



1 

N 



p{u) = {p{u;u)) 



(40) 



This is to be done with care, in view of our restriction on the number of observables: we evaluate ( [40D at 
first only for n specific values and take the limit n — > oo only after the limit N — > oo. Thus we define 



-Ui 



condition (|3q) reduces to the familiar expression limiv^oo n/ \'N = 0, and we 



get for N ^ oo and n ^ oo (taken in that order) from ( p9| ) a diffusion equation for the distribution 
of membrane potentials (describing a so-called 'time-dependent Ornstein-Uhlenbeck process' ||9|, px[|): 



J / du' p{u')g[u'] — u 



^ ^2 



The natural]^ solution of (p|) is the Gaussian distribution 

pt{u) = [2^S2(t)]-le-5[«-«(*)lVs2{t) 



(41) 



(42) 



For non- Gaussian initial conditions po(w) the solution of would in time converge towards the Gaussian solution. 
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in which S = [T+(Sq — T)e ]2 , and u evolves in time according to 



dt 



u 



(43) 



1 12 

(27r)~2e~2^ dz). We can now also calculate the distribution p{s) of neuronal firing 



(with Dz 
activities Si = g[ui] at any time: 

p{s) 



du p{u) 5[s — g[u]\ 



For our choice g[z] = ^ + ^tanh[7z] we have fi'™^[s] = ^ log[s/(l — s)], so in combination with (^2|): 



< s < 1 



p{s) 



27 ■ 

-i[(27)-Mog[./(l-s)Hu]2/s2 



lods' e 



■i[{27)-llog[s'/(l-s')]-"]2/S2 



(44) 



The results of solving and integrating numerically (|4^) and (|4^) is shown in figure ^, for Gaussian 
initial conditions (^) with uq = and Sq = 1, and with parameters 7 = J = 1 and different noise 
levels T. For low noise levels we find high average membrane potentials, low membrane potential 
variance, and high firing rates; for high noise levels the picture changes to lower average membrane 
potentials, higher potential variance, and uniformly distributed (noise-dominated) firing activities. 
The extreme cases T = and T = 00 are easily extracted from our equations. For T = one finds 
T,{t) = Sge"* and = Jg\u] — u. This leads to a final state where u = ^ J + ^ Jtanh[7ll] and where 
p{s) = S[s — u/ J]. For T = 00 one finds S = cxd (for any t > 0) and = — u. This leads to an 
final state where u = ^ J and where p{s) = 1 for all < s < 1. 

None of the above results (not even those on the stationary state) could have been obtained within 
equilibrium statistical mechanics, since any network of connected graded response neurons will violate 
detailed balance Q. Secondly, there appears to be a qualitative difference between simple networks 
(e.g. Jij = J/N) of binary neurons versus those of continuous neurons, in terms of the types of 
macroscopic observables needed for deriving closed deterministic laws: a single number m = N~'^ J2i '^i 
versus a distribution p{a) = N~^'^^5[a — ai]. Note, however, that in the binary case the latter 
distribution would in fact have been been characterised fully by a single number: the average m, since 
p{a) = ^[l+m]6[a — l] + ^[1 — m](5[cr-|-l]. In other words: there we were just lucky. 



3.2 Application to Graded Response Attractor Networks 

Derivation of Closed Macroscopic Laws. I will now turn to attractor networks with graded response 
neurons of the type (H), in which p binary patterns = (^f , . . . ,^j(r) £ 1}^ have been stored 
via separable Hebbian-type synapses dH): Jij = (2/A^) E^j.=i C'' (the extra factor 2 is inserted 
for future convenience) . Adding suitable thresholds 9i = J2j Jij f o the right-hand sides of (IsF 
and choosing the non-linearity g[z] = ^(l-|-tanh[7z]) would then give us 



d 
di 



Ui{t) 



E 



-y 

N ^ 



tanh[7Uj(t)] - Ui{t) + r]i{t) 



so the deterministic forces are fi{u) = N ^ J2f_iu taiih[7ii 



-Ui 



Choosing our macroscopic 



observables ri(u) such that (36) holds, would lead to the deterministic macroscopic laws 



dt ^ 



N 



lim yA^^{ 



— ^^j^tanh[7'Uj 



d 



..+ lim (y 



T 



_d_ 

dui 



(45) 
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As with the uniform synapses case, the main problem to be dealt with is how to choose the ri^(ti) such 
that (45) closes. It turns out that the canonical choice is to turn to the distributions of membrane 
potentials within each of the 2^ sub-lattices, as introduced in (p^): 



Ir] = {i\ $i = V} ■ pT]{u;u) = 5[u-Ui], prjiu) = {pr]{u;u)) (46) 

with T] G {—1,1}^ and limjy^oo l-^ryl/-^ = Prj- Again we evaluate the distributions in ( ^6[ ) at 
first only for n specific values and send n — > oo after N ^ oo. Now condition ( |36[ ) reduces 
to lim^v-^oo 2^/\/^ = 0. We will keep p finite, for simplicity. Using identities such as X^i • • • = 
Er? Eie/r/ • • • and 

d d d"^ d"^ 



we then obtain for — > oo and n — > oo (taken in that order) from equation (^5|) 2^ coupled diffusion 
equations for the distributions prjiu) of membrane potentials in each of the 2^ sub-lattices Irj'- 



+ T^2Pvi^) (47) 



Equation ( ^7|) is the basis for our further analysis. It can be simplified only if we make additional 
assumptions on the system's initial conditions, such as (^-distributed or Gaussian distributed pr]{u) at 
t = (see below); otherwise it will have to be solved numerically. 



Reduction to the Level of Pattern Overlaps. It is clear that (|47| ) is again of the time-dependent 
Ornstein-Uhlenbeck form, and will thus again have Gaussian solutions as the natural ones: 

Pt,r]{u) = [^T^^r][t)\ ' (48) 



d f 

-^urj = YlPV'iil ■ Ari') J Dz tanh[7(uT^/ + ^rj'z)] - u-q (49) 



in which T,rj{t) = [r-|-(S^(0) — r)e ^*]2, and with the urjit) evolving in time according to 

d _ 
urj = 

V 

Our problem has thus been reduced successfully to the study of the 2^ coupled scalar equations (|49|) . 
We can also measure the correlation between the firing activities Si{ui) = ^[1 -|-tanh(7iij)] and the 
pattern components (similar to the overlaps in the case of binary neurons). If the pattern bits are 
drawn at random, i.e. limTv^oo I -^77!/-^ = Vrj = for all 77, we can define a 'graded response' 
equivalent mfj,{u) = 2N~^ J2iCi^i{ui) £ [—1, 1] of the pattern overlaps: 

"^mH = |EC«*H = ^E^f tanh(7n.) + 0(A-^) 

i i 

= E /^^ /'^ tanh(7n) + O ( A" ^ ) (50) 



Full recall of pattern p implies Sj(nj) = 2[C^ + 1]) giving m^{u) = 1. Since the distributions p'q{ 
obey deterministic laws for A 00, the same will be true for the overlaps m = (mi, . . . ,mp). For 
the Gaussian solutions (|4^) of (^7|) we can now proceed to replace the 2^ macroscopic laws (H) , which 



3 ATTRACTOR NEURAL NETWORKS WITH CONTINUOUS NEURONS 



19 



reduce to ^urj = ri ■ Am — urj and give Urj = U'q{0)e ^ + t] ■ A J^ds ^m{s), by p integral equations 
in terms of overlaps only: 



■2t 



(51) 



{t) =Y,PV VfijDz tanh 7 (^ur,(0)e"* + 77 • ^J^ds e'~^m{s)+z^T+ {J:jj{0)-T)e' 
1 12 

with Dz = (27r)~2e~2^ dz. Here the sub-lattices only come in via the initial conditions. 

Extracting the Physics from the Macroscopic Laws. The equations describing the asymptotic (station- 
ary) state can be written entirely without sub-lattices, by taking the i — > 00 limit in (|5l|), using urj 
rj ■ Am, T^rj VT, and the familiar notation {g{$,))^ = Huin-^oo jiJ2i9i^i) = E^e{_i,i}p 5'(^): 

= {(^ I Dz tanh[7(^ • Am+zVT)])^ prj{u) = [27rT]-le-i["-^-^"^]'/^ (52) 

Note the appealing similarity with previous results on networks with binary neurons in equilibrium 
[Q]. For T = the overlap equations ( |52|) become identical to those found for attractor networks with 
binary neurons and finite p (hence our choice to insert an extra factor 2 in defining the synapses), 
with 7 replacing the inverse noise level P in the former. 

For the simplest non-trivial choice A^^i, = 5f^u (i-e. Jij = / J2 fi Ci ^ ™ the Hopfield 



model) equation (52) yields the familiar pure and mixture state solutions. For T = we find a 
continuous phase transition from non-recall to pure states of the form = md^u (for some v) at 
7c = 1. For T > we have in (52) an additional Gaussian noise, absent in the models with binary 



neurons. Again the pure states are the first non-trivial solutions to enter the stage. Substituting 
= m5^y into ( |52|) gives 

n. = /z?.tanh[7(m + .Vr)] (53) 

Writing (53) as m? = ■jm dk[l — jDz tanh^[7(A; + 2;\/T)]] < 7m^, reveals that m = as soon as 
7 < 1. A continuous transition to an m > state occurs when 7"^ = 1 — jDz tanh^ [jzVT]. A 
parametrisation of this transition line in the (7, T)-plane is given by 

'y-^{x) = l-jDztanh^{zx), T {x) = x^ h'^ {x) , x>0 (54) 

Discontinuous transitions away from m = (for which there is no evidence) would have to be calculated 
numerically. For 7 = 00 we get the equation m = erf [m/-v/2r], giving a continuous transition to m > 
at Tc = 2/7r ~ 0.637. Alternatively the latter number can also be found by taking lim^^^oo T{x) in the 
above parametrisation: 

Tc(7 = 00)= lim x^[l- [dz tanh^(zx)]^ = lim [[dz -^tanh(zx)]^ = [2 [dz 5{z)f = 2/tt 

X *oo J X >oo J J 

The resulting picture of the network's stationary state properties is illustrated in figure ^, which 
shows the phase diagram and the stationary recall overlaps of the pure states, obtained by numerical 
calculation and solution of equations ( ^4|) and (|53[) . 



Let us now turn to dynamics. It follows from (^) that the 'natural' initial conditions for Ufj and 
are of the form: U'q{G) = rj ■ ko and S/^(0) = Sq for all rj. Equivalently: 

t = 0: prj{u) = [2^S2]-^e-^["-^-^ol'/^§, fco G G 
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These would also be the typical and natural statistics if we were to prepare an initial firing state 
{si} by hand, via manipulation of the potentials {ui}. For such initial conditions we can simplify the 
dynamical equation (51) to 



m^{t) = Dz tanh 



7 



^ • [koe-^+A J ds e"~*m(s)] + z^r+(S§-T)e- 



2t 



(55) 



For the special case of the Hopfield synapses, i.e 
pattern i/ is triggered upon choosing fco,/^ 



A 



6^^, it follows from 



ko^fiu (with A:o > 0), since then equation (^ 
at any time, with the amplitude m{t) following from 



that recall of a given 
generates 



m{t) 



j Dz tanh 






Jo 



ds e'-'m{s)+zJT+{^f,-T)e 



-2t] 



(56) 



which is the dynamical counterpart of equation (|53| ) (to which indeed it reduces for t oo). 

We finally specialize further to the case where our Gaussian initial conditions are not only chosen 
to trigger recall of a single pattern ^'^ , but in addition describe uniform membrane potentials within 



the sub-lattices, i.e. k^^^ = kod^y and = 0, so pr}{u) = 5[u — kor],^]. Here we can derive from (|5q) at 
t = the identity mo = tanh[7/i;o]; which enables us to express ko as ko = {2^)~^ log[(l+mo)/(l— m-o)], 



and find (56) reducing to 
m{t) = 



Dz tanh e"* log[^-^]^ + 7[ P ds e'-^m(s)+zJT(l-e-^t)] (57) 
1-mo Jo ^ 

Solving this equation numerically leads to graphs such as those shown in figure |^ for the choice 7 = 4 
and T S {0.25,0.5,0.75}. Compared to the overlap evolution in large networks of binary networks 
(away from saturation) one immediately observes richer behaviour, e.g. non-monotonicity. 

The analysis and results described in this section, which can be done and derived in a similar 
fashion for other networks with continuous units (such as coupled oscillators) , are somewhat difficult to 
find in research papers. There are two reasons for this. Firstly, non-equilibrium statistical mechanical 
studies only started being carried out around 1988, and obviously concentrated at first on the (simpler) 
networks with binary variables. Secondly, due to the absence of detailed balance in networks of graded 
response networks, the latter appear to have been suspected of consequently having highly complicated 
dynamics, and analysis terminated with pseudo-equilibrium studies [pT| . In retrospect that turns out 
to have been too pessimistic a view on the power of non-equilibrium statistical mechanics: one finds 
that dynamical tools can be applied without serious technical problems (although the calculations 
are somewhat more involved), and again yield interesting and explicit results in the form of phase 
diagrams and dynamical curves for macroscopic observables, with sensible physical interpretations. 



4 Correlation- and Response-Functions 

We now turn to correlation functions Cij{t,t') and response functions Gij{t,t'). These will become 
the language in which the generating functional methods are formulated, which will enable us to solve 
the dynamics of recurrent networks in the (complex) regime near saturation (we take t > t'): 

Ci,{t,t') = {ai{t)a,{t')) Gij{t,t') = d{ai{t))/d9j{t') (58) 

The {cTj} evolve in time according to equations of the form (||) (binary neurons, sequential updates), 
( p2|) (binary neurons, parallel updates) or (|3^ ) (continuous neurons). The 9i represent thresholds 
and/or external stimuli, which are added to the local fields in the cases ([l|j2^), or added to the 
deterministic forces in the case of a Fokker-Planck equation (js^). We retain 9i{t) = 9i, except for a 
perturbation 69j{t') applied at time t' in defining the response function. Calculating averages such as 
( |58|) requires determining joint probability distributions involving neuron states at different times. 



4 CORRELATION- AND RESPONSE-FUNCTIONS 



21 



4.1 Fluctuation-Dissipation Theorems 

Networks of Binary Neurons. For networks of binary neurons with discrete time dynamics of the form 
Pi+i{cr) = J2cr' ^ ^']Pl{^')i the probabihty of observing a given 'path' cr{i') <t(/ + 1) 
a{i—l) a(£) of successive configurations between step i' and step £ is given by the product of the 
corresponding transition matrix elements (without summation): 

Prob[cr(/), . . . , ct{£)] = W[aii);aii-l)]W[cTii-iy, a{£-2)] . . . W[ct{£' + l);a{£')]pe^{aii')) 

This allows us to write 

C^J{i,n = E •••EP^°b[^(0,...,T(£)]a,(£)a,(£') = Y.a,a'^W'-''[a;a']pe{a') (59) 
<T(£') cr{i) era' 



'icr') (60) 



crcr'cr" 

From ( ^9| ) and (|60|) it follows that both Cij{£,£') and Gij{£,£') will in the stationary state, i.e. upon 
substituting p^/(o-') = Poo(o-'), only depend on £-£': Cij{£,£') Cij{£-£') and Gij{£,£') Gij{£-£'). 
For this we do not require detailed balance. Detailed balance, however, leads to a simple relation 
between the response function Gij(r) and the temporal derivative of the correlation function Cjj(r). 

We now turn to equilibrium systems, i.e. networks with symmetric synapses (and with all Ja = 
in the case of sequential dynamics). We calculate the derivative of the transition matrix that occurs 
in (|60|) by differentiating the equilibrium condition Pcq(''') = ^o-'^['^'i^'\Pe(i{'^') with respect to 
external fields: 

Detailed balance implies peq(o') = Z~^e~^^^^^ (in the parallel case we simply substitute the appro- 
priate Hamiltonian H H), giving dpeq{o-)/d9j = —[Z~^dZ/ddj+pdH{a)/d6j]peq{o-), so that 

cr' J K a' J J ) 

(the term containing Z drops out). We now obtain for the response function ( |60| ) in equilibrium : 

G,,[£) = (3Y: <y^W'-' [a; cr'] |^ W [a'; a"] ^|^p,,(a") - ^|^p,^(^')} (61) 

The structure of (|6^) is similar to what follows upon calculating the evolution of the equilibrium 
correlation function ( |59| ) in a single iteration step: 

C^,{£) - G,,{£-1) = J2 ^-^W'-' k; <t'][y.W [a'; a"] a'^p^^ia") - a'^Poqirr')] (62) 
aa' [ cr" ) 

Finally we calculate the relevant derivatives of the two Hamiltonians H{a) = —J2i<j Jij'^i'^j+J2i(^i^i 
and H{(t) =-Ei^iCTi-/3-iEilog2cosh[/5/ii(cr)] (with hi{a) = Y.j Jtj(Tj+Oi), see §: 

dH{a)/dej = -aj dH{a)/dej = -aj -tanh[f3hj{(T)] 
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For sequential dynamics we hereby arrive directly at a fluctuation-dissipation theorem. For parallel 
dynamics we need one more identity (which follows from the definition of the transition matrix in ( |2^ ) 
and the detailed balance property) to transform the tanh occurring in the derivative of H: 

tanh[/3/i, {cT')]p,^{a') = ^ a]W [a"; a'] p,^{a') = Y,W [a'; a"] a]p,^{<T") 

cr" cr" 

For parallel dynamics i and £' are the real time labels t and t' , and we obtain, with r = t — t': 

Binary k Parallel : G^j{T > 0) = -p[Cij{T+l) - Cij(r-l)], G,j(r < 0) = (63) 

For the continuous-time version (||) of sequential dynamics the time t is defined as t = £/N, and the 
difference equation ( |62[ ) becomes a differential equation. For perturbations at time t' in the definition 
of the response function ( |60|) to retain a non- vanishing effect at (re-scaled) time t in the limit N ^ oo, 
they will have to be re-scaled as well: 56j{t') — > N56j{t'). As a result: 

Binary k Sequential : G,j(r) = -/39{T)-^Cij{T) (64) 

The need to re-scale perturbations in making the transition from discrete to continuous times has the 
same origin as the need to re-scale the random forces in the derivation of the continuous-time Langevin 
equation from a discrete-time process. Going from ordinary derivatives to functional derivatives (which 
is what happens in the continuous-time limit), implies replacing Kronecker delta's 6t^t' by Dirac delta- 
functions according to 6t^t' — > A5{t—t'), where A is the average duration of an iteration step. Equations 



(|63D and (64) are examples of so-called fluctuation-dissipation theorems (FDT). 



Networks with Continuous Neurons. For systems described by a Fokker-Planck equation ( p^ ) the 
simplest way to calculate correlation- and response-functions is by first returning to the underlying 
discrete-time system and leaving the continuous time limit A — > until the end. In Q we saw that 
for small but finite time-steps A the underlying discrete-time process is described by 

t = (A, PiA+A{(T) = [l+ACa + 0{Al)]peA{fT) 
with £ = 0,1,2, .. . and with the differential operator 

From this it follows that the conditional probability density piA{f^\(^' : ^' for finding state a at time 
lA, given the system was in state cr' at time i'A, must be 

peA{(T\(T',e'A) = [1+ACct + 0{aI)Y-^' 5[a-a'] (66) 

Equation (66) will be our main building block. Firstly, we will calculate the correlations: 

Cij{£A,i'A) = {ai{lA)aj{l' A)) = J dada' aia'j piA{(T\a' ,£' A)peAi(T') 

= Jda ai[l+ACa + 0{Al)Y~^' J da' a'/[<T - a'jpf, A{(r') 
= Jda- a^[l + ACa + 0{A^)Y~^' [aj p^aH] 
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At this stage we can take the hmits A ^ and £,i' oo, with t = iA and t' = i'A finite, using 
limA^o[l+A^]^/^ = e^^: 

C,j{t, t') = J da a, e(*-*')^cr p^,(cr)] (67) 

Next we turn to the response function. A perturbation applied at time t' = i'A to the Langevin forces 
fi{cr) comes in at the transition a{£'A) — > <t(£'A+A). As with sequential dynamics binary networks, 
the perturbation is re-scaled with the step size A to retain significance as A — > 0: 



G^J{£A,i'A) 



d{ai{£A)) 



d 



Ad9j{£'A) AdOjil'A) 
= j da da' da" CTj pi^{a\a" ^l' A + A) 

J dada'da" ai[l + A£cT + 0{Al)Y-^'-^5[a-a"] 



dada' ai p(:^{a\a' , i' A)p£'^{a') 
dpi"A+A{o-\a',e'A) 



AdOi 



^±[l+ACcr" + 0{AlMa"-a' 
d 



Vt'A{o-') 



= - jdada'da" ai[l + ACa + 0{Ai)Y-^'-^5[a-a"]6[a"-a'][-^+0{A^ p^.^ia' 

= - jda aAl + ACa + 0{Ai)]'-''-\-^ + 0{A^ pe^i^r) 
We take the limits A ^ and ^, ^ oo, with t = iA and t' = i'A finite: 



(68) 



Equations ( p7| ) and ( pq ) apply to arbitrary systems described by Fokker-Planck equations. In the 
case of conservative forces, i.e. fi{a') = —dH{a)/dai, and when the system is in an equilibrium state 
at time t' so that Cij{t,t') = Cij{t — t') and Gij{t,t') = Gij{t — t'), we can take a further step using 
Pi,(cr) =peq(cr) = Z-^e-^"^'^\ In that case, taking the time derivative of expression (pfj) gives 

d 



d f r 

— Cij(r) = da ai ^ Ccr [aj Pcq(o")] 



Working out the key term in this expression gives 

l^a\oj Pcq(cr)] = -^^\fi{<^)-T-^\\(yj Peq(o-)] = T —p^^{a) - —\ajJi{a)\ 



dai 



dai 



dai 



with the components of the probability current density Ji{a) = [fi{a) — T-^]peci{a). In equilibrium, 
however, the current is zero by definition, so only the first term in the above expression survives. 
Insertion into our previous equation for 9Cij(r)/9r, and comparison with (|6^) leads to the FDT for 
continuous systems: 

Continuous : Gy(r) = -(3e{T)^Cij{T) (69) 

We will now calculate the correlation and response functions explicitly, and verify the validity or 
otherwise of the FDT relations, for attractor networks away from saturation. 
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4.2 Example: Simple Attractor Networks with Binary Neurons 



Correlation- and Response Functions for Sequential Dynamics. We will consider the continuous time 
version (|^) of the sequential dynamics, with the local fields hi{cr) = J2j Jij'^j + ^i; ^-iid the separable 
interaction matrix (^) . We already solved the dynamics of this model for the case with zero external 
fields and away from saturation (i.e. p ^ ^/N). Having non-zero, or even time-dependent, external 
fields does not affect the calculation much; one adds the external fields to the internal ones and finds 
the macroscopic laws (pO[) for the overlaps with the stored patterns being replaced by 



(70) 



^m(t) = Jiin^ ^ E t^'^li i^^i ■ Am{t)+e,{t)] - m(t) 



N^oo N 



Fluctuations in the local fields are of vanishing order in A'^ (since the fluctuations in m are), so that 
one can easily derive from the master equation (||) the following expressions for spin averages: 



d_ 
di 



{a,{t)) = i&nh(i[li^-Am{t)+ei{t)] - {ai{t)) 



(71) 



d 



i^j- ■^{(riit)ajit)) = tanh P[^i-Amit)+ei{t)]{aj{t)) +tanh P[$j-Amit)+ej{t)]{aiit)) -2{ai{t)aj{t)) 

(72) 



Correlations at different times are calculated by applying (71) to situations where the microscopic 
state at time t' is known exactly, i.e. where pt'{cr) = 6(j cr' for some cr': 



+ J^ds e'^'hanh ■ Am{s; a' ,t')+ei{s)] 



(73) 



with Tn{s; a' ,t') denoting the solution of (^) following initial condition m{t') = jjj^i^i^i- If 
multiply both sides of ( |73| ) by a'j and average over all possible states a' at time t' we obtain in leading 
order in A^: 



{ai{t)aj{t')) = {ai{t')aj{t'))e-^'-''^ + ds e^-*(tanh/?[^i • Am{s-a{t'),t')+ei{s)\aj{t')) 



Because of the existence of deterministic laws for the overlaps m in the N ^ oo limit, we know with 
probability one that during the stochastic process the actual value m{cr{t')) must be given by the 
solution of ([7C|), evaluated at time t'. As a result we obtain, with Cij{t,t') = {ai{t)aj{t')): 



Cij{t,t') = C,j(t',t')e"^*"*') + f^ds e'-Hanhf3[$i ■ Am{s)+ei{s)]{aj{t')) 

Jt' 



(74) 



Similarly we obtain from the solution of (|7^) an equation for the leading order in A^ of the response 
functions, by derivation with respect to external fields: 



d{<t)) 

dej{t') 

or 



I39{t-t') 



ds e 



s-t 



l-tanh2/3[^, • ATn{s)+eiis)] 



N 



dOjit') 



Gij{t,t') = P6,j9{t-t')e-^^-^'^ [l-tanh2/3[|. • Am{t')+9,{t')] 



+ /30(i-O I Js e'-'\l-tanh^ Pl^r Am{s)+0^(,s)]\ - ^(^. • A^,)G,,(s, t') (75) 
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For t = t' we retain in leading order in only the instantaneous single-site contribution 



limGij{t,t') = f36ij \ l-tanh^ (3[$,i ■ Am(t)+6'i(t)] 



(76) 



This leads to the following ansatz for the scaling with N of the Gij{t,t'), which can be shown to be 
correct by insertion into (|73), in combination with the correctness at t = t' following from (|76|): 



i = j: Gii{t,t') = 0{1), i^j: Gi,{t,t') = 0{N~^) 

Note that this implies jjj^ki^i ' A^k)Gkj{s,t') = 0{jj). In leading order in we now find 

Gij{t,t') = (56ije{t-t')e^^^-^'^ [l-tanh2/3[^, • Am(t')+^i (*')]] (77) 

For those cases where the macroscopic laws (|7^) describe evolution to a stationary state m, obviously 
requiring stationary external fields Oi{t) = 9i, we can take the limit t — > oo, with t—t' = r fixed, in the 
two results ( [7^j77| ) . Using the t ^ oo limits of ( 71 ,^) we subsequently find time translation invariant 
expressions: lim^^oo Cij{t,t—T) = Gij{T) and lim^^oo Gij(t,t—T) = Gij{T), with in leading order in N 



Cij{T) = tanh • Am+Oi] tanh • Am+Oj] + dije'^ 1-tanh^ • Am+Oi] 

Gij{T) = l35ije{T)e-^ [l-tanh2/?[^. • Am+Oi 
for which indeed the Fluctuation-Dissipation Theorem ( p^ ) holds: Gij{T) = —P9{T)-^Cij{T). 



(78) 
(79) 



Correlation- and Response Functions for Parallel Dynamics. We now turn to the parallel dynamical 
rules (22), with the local fields hi{cr) = Ylj Jij'^j'^^i^ s-^id the interaction matrix (18). As before, 
having time-dependent external fields amounts simply to adding these fields to the internal ones, and 
the dynamic laws (^) are found to be replaced by 



1 



m{t + 1) = ^lim^ ^ E tanh [/?^, • Am(t)+^i(t)] 



(80) 



Fluctuations in the local fields are again of vanishing order in A^, and the parallel dynamics versions 
of equations (|7l| , |72|) , to be derived from ([2^), are found to be 

{(Ji{t+l)) = tanh/?[^i • Am{t)+ei{t)] (81) 

i / j : l^cri{t+l)aj{t+l)) = tanh/3[^i • Am{t) + 6 i{t)] tanh. j3[$,j ■ Am{t)+ej{t)] (82) 

With m{t] ct' , t') denoting the solution of the map (|80|) following initial condition m{t') = j^J^i <^i^i) 
we immediately obtain from equations ( ^J8^ ) the correlation functions: 

Gijit,t) = 6ij + [1 -6ij] tanh ■ ATn{t -l)+ei{t-l)] tanh ■ Am{t-l)+ej{t-l)] (83) 

t>t' : Cij{t,t') = {tanh (3[(,i ■ Am{t-l;cT{t'),t')+ei{t-l)]aj{t')) 

= tanh/3[|i • Am{t-l)+9^it-l)]tanh ■ Am{t' -l)+9j{t' -1)] (84) 
From ( pl[ ) also follow equations determining the leading order in A^ of the response functions Gij(t, t'), 



(85) 



by derivation with respect to the external fields Oj{t'): 

t'>t-l: Gij{t,t') = 

Gij{t,t') = I35^j \l-tanh^ I3[$,i ■ Am{t-l)+ei{t-l)] 



t' = t-1 
t' < t-1 



G,,{t,t')=P 1-tanh^ Am{t-l)+e,it-l)] j^J2ki^, . A^,)Gkjit-l,t') 
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It now follows iteratively that all off-diagonal elements must be of vanishing order in N: Gij{t, t—1) 
SijGii{t,t — l) Gij{t,t — 2) = 6ijGii(t,t — 2) — > . . ., so that in leading order 



G^,{t,t') = P6,,5t^t'+i l-tanh2/3[^, • Am(t')+^.(t')] 



(86) 



For those cases where the macroscopic laws (8C) describe evolution to a stationary state m, with 



stationary external fields, we can take the limit t ^ oo, with t — t' = r fixed, in (^,^,^). We find 
time translation invariant expressions: limt_»oo Cij{t,t—T) = Cij{T) and limj^oo Gij{t,t—T) = Gij{T), 
with in leading order in N: 

Cij{T) = tanh/3[^j • Am+Oi] tanh(3[^j ■ Am+Oj] + 6ij5rfl l-tanh^ • Am+Oi] (87) 

Gijir) = f36^j6r,l [l-tanh^ p[^- • Am+Oi]] (88) 
obeying the Fluctuation-Dissipation Theorem (p^): Gy(T > 0) = —P[Cij{T + l) — Gij{T — l)]. 

4.3 Example: Graded Response Neurons with Uniform Synapses 

Let us finally find out how to calculate correlation and response function for the simple network ( |3^ ) 
of graded response neurons, with (possibly time-dependent) external forces 9i{t), and with uniform 
synapses Jij = J/N: 

'^-M*) = iY.9huj{t)]-u^{t)+ei{t)+r]i{t) 



dt 



N 



For a given realisation of the external forces and the Gaussian noise variables {T/i(t)} we can formally 
integrate (^) and find 



Ui{t) = Ui(0)e + / ds e 



J du p{u; w(s)) g[-fu\ + 9i{s) + r?j(s) 



(90) 



with the distribution of membrane potentials p{u;u) = N^^ J2i^['^~'^i]- The correlation function 
Cij{t,t') = {ui{t)uj{t')) immediately follows from (|90|). Without loss of generality we can define t > t' . 
For absent external forces (which were only needed in order to define the response function), and upon 
using {r]i{s)) = and (f?j(-s)%(s')) = 2T6ijS{s — s'), we arrive at 



Gij{t,t') = T6ij{e 



t'-t 



+ { 



Ui{0)e +J / du gl'ju] / ds e'^ p{u;u{s)) 



Uj{0)e * +J / du glju] / ds' * p{u;u{s')) 



For N — > oo, however, we know the distribution of potentials to evolve deterministically: p{u; u{s)) 
Ps {u) where ps (u) is the solution of ( pl ) . This allows us to simplify the above expression to 



N^oo: Cij{t,t')= T5ij (e* ^ - e 



+ { 



Ui{0)e * + J J du gl'ju] J ds e" ^ps 



u 



Uj{0)e +J du g[ju] / ds' Ps'{u) 



(91) 



Next we turn to the response function Gij{t,t') = 5{ui{t)) /5^j{t') (its definition involves functional 
rather than scalar differentiation, since time is continuous). After this differentiation the forces {9i{s)} 
can be put to zero. Functional differentiation of (pO|), followed by averaging, then leads us to 

6uk{s) 



Gij{t,t') = e{t-t') 6ij e*'-* -jfdu g[ju]^ f ds e'-' 1™ ( 5[u-Uk{s)] 

J ou Jo iv ^ 0^0 
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In view of (90) we make the self-consistent ansatz 5uk{s)/6^j{s') = 0{N ^) for k ^ j. This produces 

N^oo : G^j{t,t') = 9{t-t') 6ij e*'~* (92) 

Since equation (^) evolves towards a stationary state, we can also take the limit t — > cxd, with t—t' = r 
fixed, in (|9ll) . Assuming non-pathological decay of the distribution of potentials allows us to put 
limf_^oo f^ds e^~^ps{u) = p{u) (the stationary solution of (pID), with which we find not only ( p^ ) but 
also (^Tl) reducing to time translation invariant expressions for — > oo, lim^^oo Cij{t,t — T) = Cij{T) 
and limt^oo Gijit,t—T) = Gij{T), in which 

Cijir) = TSije-^ + ( f du p{u)g[ju]} Gijir) = e{T)6^je-^ (93) 



Clearly the leading orders in N of these two functions obey the fluctuation-dissipation theorem (|69|): 
Gij{T) = -l30{T)-^Cij{T). As with the binary neuron attractor networks for which we calculated the 
correlation and response functions earlier, the impact of detailed balance violation (occurring when 



Afj^i, 7^ A^fj_ in networks with binary neurons and synapses (18), and in all networks with graded 
response neurons |jl|) on the validity of the fluctuation-dissipation theorems, vanishes for N ^ oo, 
provided our networks are relatively simple and evolve to a stationary state in terms of the macroscopic 
observables (the latter need not necessarily happen, see e.g. figures |l| and ^). Detailed balance 



violation, however, would be noticed in the finite size effects [12|. 



5 Dynamics in the Complex Regime 

The approach we followed so far to derive closed macroscopic laws from the microscopic equations fails 
when the number of attractors is no longer small compared to the number A^ of microscopic neuronal 
variables. In statics we have seen [|[| that, at the work fioor level, the fingerprint of complexity is the 
need to use replica theory, rather than the relatively simple and straightforward methods based on (or 
equivalent to) calculating the density of states for given realisations of the macroscopic observables. 
This is caused by the presence of a number of 'disorder' variables per degree of freedom which is 
proportional to A^, over which we are forced to average the macroscopic laws. One finds that in 
dynamics this situation is reflected in the inability to flnd an exact set of closed equations for a 
finite number of observables (or densities). We will see that the natural dynamical counterpart of 
equilibrium replica theory is generating functional analysis. 



5.1 Overview of Methods and Theories 

Let us return to the simplest setting in which to study the problem: single pattern recall in an attractor 
neural network with N binary neurons and p = aN stored patterns in the non-trivial regime, where 
a > 0. We choose parallel dynamics, i.e. ([2^), with Hebbian-type synapses of the form (^) with 
^fiu = Sfiu, i-e. Jij = N~^ giving us the parallel dynamics version of the Hopfield model 

Our interest is in the recall overlap m{a) = N~^ J^i'^iCj between system state and pattern one. We 
saw in Q that for A — > cxd the fluctuations in the values of the recall overlap m will vanish, and that 
for initial states where all (Tj(0) are drawn independently the overlap m will obey 

m(m) = JdzPt{z)tanh[(3{mit)+z)] ^t(^) = ^1™^ ^ E^'^t^ " ^ E ^'^f E ^.%-(*)]) (94) 
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and that all complications in a dynamical analysis of the a > regime are concentrated in the 
calculation of the distribution Pt{z) of the (generally non-trivial) interference noise. 



Gaussian Approximations. As a simple approximation one could just assume ||13[ that the Cj remain 
uncorrelated at all times, i.e. Prob[o"i(t) = zt^^^] = ^[1 it m(t)] for all t > 0, such that the argument 
given in for t = (leading to a Gaussian P{z)) would hold generally, and where the mapping (|9^ ) 
would describe the overlap evolution at all times: 

Pt{z) = [27ra]-^e-^^'/" : m{t + 1) = J Dz tanh[/3(m(t) + z^Ja)] (95) 

with the Gaussian measure Dz = (27r)~ 2e~ 2^ dz. This equation, however, must be generally incorrect. 
Firstly, figure 5 in Q shows that knowledge of m{t) only does not permit prediction of m{t + l). 
Secondly, expansion of the right-hand side of (^) for small m{t) shows that ( |95[ ) predicts a critical 
noise level (at a = 0) of Tc = (3~^ = 1, and a storage capacity (at T = 0) of Oc = 2/7r ^ 0.637, 
whereas both numerical simulations and equilibrium statistical mechanical calculations point to 
etc ~ 0.139. Rather than taking all ai to be independent, a weaker assumption would be to just 
assume the interference noise distribution Pt{z) to be a zero-average Gaussian one, at any time, with 
statistically independent noise variables z at different times. One can then derive (for N ^ 00 and 



fully connected networks) an evolution equation for the width 5](t), giving |14, |15|: 



Pt{z) = [27rS2(t)]-5e-^'/^'W : m{t + l) = J Dz tanh[/3(m(t) + zS(t))] 

S2(t+1) = a + 2am{t+l)m{t)h[m{t),i:{t)] + T.'^{t)h'^[m{t),i:{t)] 

with /i[m,E] = f3 1 — jDz ta.nh^ [P {m + zT,)] . These equations describe correctly the qualitative 
features of recall dynamics, and are found to work well when retrieval actually occurs. For non- 
retrieval trajectories, however, they appear to underestimate the impact of interference noise: they 
predict Tc = 1 (at a = 0) and a storage capacity (at T = 0) of Oc ~ 0.1597 (which should have been 
about 0.139). A final refinement of the Gaussian approach [l^ consisted in allowing for correlations 
between the noise variables z at different times (while still describing them by Gaussian distributions) . 
This results in a hierarchy of macroscopic equations, which improve upon the previous Gaussian 
theories and even predict the correct stationary state and phase diagrams, but still fail to be correct 
at intermediate times. The fundamental problem with all Gaussian theories, however sophisticated, 
is clearly illustrated in figure 6 of |||]: the interference noise distribution is generally not of a Gaussian 
shape. Ptiz) is only approximately Gaussian when pattern recall occurs. Hence the successes of 
Gaussian theories in describing recall trajectories, and their perpetual problems in describing the 
non-recall ones. 

Non- Gaussian Approximations. In view of the non-Gaussian shape of the interference noise distri- 
bution, several attempts have been made at constructing non-Gaussian approximations. In all cases 
the aim is to arrive at a theory involving only macroscopic observables with a single time-argument. 
Figure 6 of suggests that for a fully connected network with binary neurons and parallel dynamics 



a more accurate ansatz for Pt{z) would be the sum of two Gaussians. In |17| the following choice was 



proposed, guided by the structure of the exact formalism to be described later: 

p,{z) = ptiz) + pf{z), ptiz) = lirn^ 1 E {5[z - 1 5: e.^er E ^^.(0]) 

p±(^) ^ l±m(t) ^_lf^^rfffll2/s2ffl 

* ^ ^ 2S(t)V2i 
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followed by a self-consistent calculation of d{t) (representing an effective 'retarded self- interaction', 
since it has an effect equivalent to adding hi(a{t)) — > hi{a{t)) + d{t)ai{t)), and of the width S(t) of 
the two distributions P^{z), together with 

m{t+l) = ^[l+m{t)] Jdz tanh[P{m{t)+d{t)+zJ:{t))] + ^[l-m{t)] j Dz tanh[/3(m(t)-d(t)-FzS(t))] 

The resulting three-parameter theory, in the form of closed dynamic equations for {m, d, S}, is found 
to give a nice (but not perfect) agreement with numerical simulations. 

A different philosophy was followed in ||l^ (for sequential dynamics). First (as yet exact) equa- 
tions are derived for the evolution of the two macroscopic observables m{<j) = mi{a) and r{a) = 
Z]/i>i ^fii'^)^ with m^{o-) = N^^ J^i^i'^i^ which are both found to involve Pt{z): 



d 

—m 

dt 



[ dz Pt(z)tanh[l3(m + z)] —r = - [ dz Pt(z)ztanh[l3(m+z)] + 1 

J dt a J 



Next one closes these equations by hand, using a maximum-entropy (or 'Occam's Razor') argument: 
instead of calculating Pt{z) from ( |94[ ) with the real (unknown) microscopic distribution pt{o'), it is 
calculated upon assigning equal probabilities to all states cr with m{cr) = m and r{a) = r, followed 
by averaging over all realisations of the stored patterns with /i > 1. In order words: one assumes (i) 
that the microscopic states visited by the system are 'typical' within the appropriate (m, r) sub-shells 
of state space, and (ii) that one can average over the disorder. Assumption (ii) is harmless, the most 
important step is (i). This procedure results in an explicit (non-Gaussian) expression for the noise 
distribution in terms of (m, r) only, a closed two-parameter theory which is exact for short times 
and in equilibrium, accurate predictions of the macroscopic flow in the (m, r)-plane (such as that 
shown in figure 5 of [Q]), but (again) deviations in predicted time-dependencies at intermediate times. 
This theory, and its performance, was later improved by applying the same ideas to a derivation of 
a dynamic equation for the function Pt{z) itself (rather than for m and r only) [|l9|; research is still 
under way with the aim to construct a theory along these lines which is fully exact. 

Exact Results: Generating Functional Analysis. The only fully exact procedure available at present 
is known under various names, such as 'generating functional analysis', 'path integral formalism' or 
'dynamic mean-field theory', and is based on a philosophy different from those described so far. Rather 
than working with the probability Pt(cr) of finding a microscopic state cr at time t in order to calculate 
the statistics of a set of macroscopic observables ft(cr) at time t, one here turns to the probability 
Prob[<T(0), . . . , cr(tm)] of finding a microscopic path <t(0) c(l) cr(tm)- One also adds time- 

dependent external sources to the local fields, hi{a) hi(cr) + 6i(t), in order to probe the networks via 
perturbations and define a response function. The idea is to concentrate on the moment generating 
function Z[il)], which, like Prob[<T(0), . . . ,cr{tm)], fully captures the statistics of paths: 

= (e-^S^St^o'^^W^'W) (96) 

It generates averages of the relevant observables, including those involving neuron states at differ- 
ent times, such as correlation functions Cij{t,t') = {ai{t)aj{t')) and response functions Gij{t,t') = 
d{ai(t)) /d9j{t'), upon differentiation with respect to the dummy variables {^l^i{t)}■. 

^a,(t)) = i lim a,(t,t') = - lim Q,(t,t') = . lim ^ (97) 

Next one assumes (correctly) that for N ^ oo only the statistical properties of the stored patterns 
will infiuence the macroscopic quantities, so that the generating function Z[il)] can be averaged over 
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all pattern realisations, i.e. Z[ip] — > As in replica theories (the canonical tool to deal with 

complexity in equilibrium) one carries out the disorder average before the average over the statistics 
of the neuron states, resulting for ^ oo in what can be interpreted as a theory describing a single 
'effective' binary neuron cr{t), with an effective local field h{t) and the dynamics Prob[a{t-\-l) = ±1] = 
|[1 ± tanh[/3/i(t)]]. However, this effective local field is found to generally depend on past states of the 
neuron, and on zero-average but temporally correlated Gaussian noise contributions ^(t): 

h{t\{a}, {(/)}) = m{t) + e{t) + aJ2 + Va(t){t) (98) 

t'<t 

The first comprehensive neural network studies along these lines, dealing with fully connected net- 
works, were carried out in pO], 2^], followed by applications to a-symmetrically and symmetrically 
extremely diluted networks P2| , 23| (we will come back to those later). More recent applications in- 
clude sequence processing networks |2^0. For N ^ oo the differences between different models are 
found to show up only in the actual form taken by the effective local field (|98|) , i.e. in the dependence of 
the 'retarded self-interaction' kernel R{t,t') and the covariance matrix {(j)(t)(p{t')) of the interference- 
induced Gaussian noise on the macroscopic objects C = {C{s,s') = limj^^oo j/J2iCii{s, s')} and 
G = {G{s, s') = limAT^oo jr Et Gii{s, s')}. For instance]]: 



model 


synapses Jij 


R{t,t') 


m)m) 


fully connected, 
static patterns 


1 Y^aN tMtM 
N 2^11=1 ?i ?j 


[{l-G)-^G]{t,t') 


[(I-G)-iC(I-Gt)-i](t,t') 


fully connected, 
pattern sequence 


1 sr^aN a/^+IaM 





E„>o[(Gt)"CG"](t,0 


symni extr diluted, 
static patterns 




G{t,t') 


C{t,t') 


asymm extr diluted, 
static patterns 







C{t,t') 



with the Cij drawn at random according to P{cij) = -f^Sa^i + (1— ;^)Jci ,o (either symmetrically, 
i.e. Cij = Cji, or independently) and where ca = 0, limAr_^oo c/A^ = 0, and c — > oo. In all cases 
the observables (overlaps and correlation- and response-functions) are to be solved from the following 
closed equations, involving the statistics of the single effective neuron experiencing the field (p^): 

m{t) = {a{t)) C{t,t') = {a{t)(j{t')) G{t,t') = d{a{t)) / de{t') (99) 

It is now clear that Gaussian theories can at most produce exact results for asymmetric networks. 
Any degree of symmetry in the synapses is found to induce a non-zero retarded self-interaction, via 
the kernel K{t,t'), which constitutes a non-Gaussian contribution to the local fields. Exact closed 
macroscopic theories apparently require a number of macroscopic observables which grows as 0{t'^) in 
order to predict the dynamics up to time t. In the case of sequential dynamics the picture is found to 
be very similar to the one above; instead of discrete time labels t G {0, 1, . . . ,tm}) path summations 
and matrices, there one has a real time variable t G [0,tm]) path-integrals and integral operators. The 
remainder of this paper is devoted to the derivation of the above results and their implications. 

^In the case of sequence recall the overlap m is defined with respect to the 'moving' target, i.e. m{t) — i (Ti(f)^' 
^In the case of extremely diluted models the structure variables are also treated as disorder, and thus averaged out. 
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5.2 Generating Functional Analysis for Binary Neurons 

General Definitions. I will now show more explicitly how the generating functional formalism works 
for networks of binary neurons. We define parallel dynamics, i.e. (^), driven as usual by local fields 
of the form hi{a;t) = J2j ^ij^j + ^i(^)) but with a more general choice of Hebbian-type synapses, in 
which we allow for a possible random dilution (to reduce repetition in our subsequent derivations): 

= ^ E ^f^i P = ^^ (100) 

^J,=l 

Architectural properties are reflected in the variables Cij £ {0, 1}, whereas information storage is 



to be effected by the remainder in ( 100 ), involving p randomly and independently drawn patterns 
= (^j^, . . . G {—1) 1}^. I will deal both with symmetric and with asymmetric architectures 
(always putting ca = 0), in which the variables Cij are drawn randomly according to 

c c 

symmetric: aj = cji, Vi < j -P(cii) = -^(^c,,,! + (1 - -^)5c,,,o (101) 

c c 

asymmetric : / j P{cij) = j^^c,,,i + (1 - j^)^c,,,o (102) 

(one could also study intermediate degrees of symmetry; this would involve only simple adaptations). 
Thus Cfc/ is statistically independent of cij as soon as {k,l) ^ {(i, j), (j, i)}. In leading order in N 
one has Cij) = c for all i, so c gives the average number of neurons contributing to the field of 
any given neuron. In view of this, the number p of patterns to be stored can be expected to scale as 
p = ac. The connectivity parameter c is chosen to diverge with N, i.e. lim7v-+oo c^^ = 0. li c = N 
we obtain the fully connected (parallel dynamics) Hopfield model. Extremely diluted networks are 
obtained when limAr^oo c/N = 0. 

For simplicity we make the so-called 'condensed ansatz': we assume that the system state has an 
0{N^) overlap only with a single pattern, say // = 1. This situation is induced by initial conditions: 
we take a randomly drawn cr(0), generated by 

Pi^m = n {li^+^oK^oUi + ^[l-moK^^o),-A so ^E^'(^i(0)) = ^0 (103) 

i i 

The patterns /i > 1, as well as the architecture variables Qj, are viewed as disorder. One assumes 
that for N ^ oo the macroscopic behaviour of the system is 'self-averaging', i.e. only dependent on 
the statistical properties of the disorder (rather than on its microscopic realisation). Averages over 
the disorder are written as We next define the disorder-averaged generating function: 



Zitp] = {e-'^^^t^'^'^"*^'^) (104) 



in which the time t runs from t = to some (finite) upper limit tm- Note that Z[0] = 1. With a modest 
amount of foresight we define the macroscopic site-averaged and disorder-averaged objects m{t) = 
N-^Ei^iWt)), C{t,t') = N-^Y.i {<tXt')) and G{t,t') = N-^Y.^^W^ I dei{t'). According to 
(|97[) they can be obtained from (|104|) as follows: 



m{t)= hm AVdl^ (105) 



^' ' rp^o N ^ ^^l^jit)^^l^J{t') ^' ^ ^p_.o N ^ di;,it)de,it') ^ ' 



So far we have only reduced our problem to the calculation of the function Z['il>\ in ( |104D , which will 
play a part similar to that of the disorder-averaged free energy in equilibrium calculations (see P]). 
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<u> 




Figure 5: Dynamics of a simple network of N graded response neurons ( |38|) with synapses Jij = J/N 
and non-linearity g[z] = ^[H-tanli(7z)], for N ^oo, 7 = J=1, and Tg {0.25, 0.5, 1, 2, 4}. Left: evolution 
of average membrane potential (u) = n, with noise levels T increasing from top graph (T = 0.25) to 
bottom graph (T = 4). Middle: evolution of the width S of the membrane potential distribution, 
= (u^) — (u)^, with noise levels decreasing from top graph (T = 4) to bottom graph (T = 0.25). 
Right: asymptotic {t = oo) distribution of neural firing activities p{s) = {5[s— g[u]]) , with noise levels 
increasing from the sharply peaked curve (T = 0.25) to the almost flat curve (T = 4). 
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1.0 




Figure 6: Left: phase diagram of the Hopfield model with graded-response neurons and Jij = 
{2 / N) J2 iM ^ away from saturation. P: paramagnetic phase, no recall. R: pattern recall phase. 
Solid line: separation of the above phases, marked by a continuous transition. Right: asymptotic re- 
call amplitudes m = (2/iV) J2i Ci^i of pure states (defined such that full recall corresponds to m = 1), 
as functions of the noise level T, for 7-^ G {0.1, 0.2, . . . , 0.8, 0.9} (from top to bottom). 
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Figure 7: Overlap evolution in the Hopfield model with graded-response neurons and Jy = 
{2/N) J2fM^i^jj away from saturation. Gain parameter: 7 = 4. Initial conditions: pr]{u) = 5[u— korji] 
(i.e. triggering recall of pattern v, with uniform membrane potentials within sub-lattices). Lines: recall 
amplitudes m = {2/N) J^i ^i^i of pure state v as functions of time, for T = 0.25 (upper set), T = 0.5 
(middle set) and T = 0.75 (lower set), following different initial overlaps mo G {0.1, 0.2, . . . , 0.8, 0.9}. 
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Evaluation of the Disorder-Averaged Generating Function. As in equilibrium replica calculations, 
the hope is that progress can be made by carrying out the disorder averages first. In equilibrium 
calculations we use the replica trick to convert our disorder averages into feasible ones; here the idea is 
to isolate the local fields at different times and different sites by inserting appropriate ^-distributions: 

1 = 11 ^dh,{t)6[h{t)-J2J^Mt)-e,{t)] = /'{d/idh}e'^-^'(*)t'^^^*)"^^-''^"^(*^-'"(*)^ 
it j 

with {dhdh} = Yl^^[dhi{t)dhi{t) /27r], giving 

f{dhdh}e' S« (*) I'*' (*)-^' (*)] (e-* E.t ^> (*) 



in which (. . .)pf refers to averages over a constrained stochastic process of the type (22), but with pre- 
scribed fields {hi{t)} at all sites and at all times. Note that with such prescribed fields the probability 
of generating a path {<t(0), . . . , cr{tm)} is given by 

Prob[cr(0), . . . , CT{tra)\{hi{t)}] = p(cr(0))eS«*[^'^»(*+i)'^»W-l°§2cosh[/3h,(t)]] 

N:F[{a},{h}]YV ihiit)[h^{t)~0i{t)]-ii,i{t)ai{t)+l3ai{t+l)hi{t^^^^ 



so 



Z[iP]= {dhdh}Y.---^ p{a{0))e 
cr(o) <T(t„) 



it 



with 



1 



(107) 
(108) 



We concentrate on the term J^[. . .] (with the disorder), of which we need only know the limit oo, 
since only terms inside Z[ip] which are exponential in will retain statistical relevance. In the 
disorder-average of ( |108| ) every site i plays an equivalent role, so the leading order in A^ of ( |108| ) 
should depend only on site-averaged functions of the {ai{t),hi{t)}, with no reference to any special 
direction except the one defined by pattern The simplest such functions with a single time variable 
are 

"(^Si*^}) = ^E^'^^W = j^J2^lh,it) (109) 

i i 

whereas the simplest ones with two time variables would appear to be 



q{t,t';{cT}) = l^a,(t)a,(t') Qit,t';{h}) = 1 ^ A,(t)/i,(t' 

i i 

K(t,t';{^,/i}) = l^A,(t)a,(t') 



(110) 
(111) 



It will turn out that all models of the type ( |100| ), with either ( |101| ) or (102), have the crucial property 
that (|109| , |110 , 111| ) are in fact the only functions to appear in the leading order of ( |108| ): 



n • •] = ^[{a(i; • • Ht; . . .),q{t, t'; . . .), Q{t, t'; . . .),K{t, t'; . . .)}] + . . . (A^ ^ oo) 



(112) 



for some as yet unknown function '&[...]. This allows us to proceed with the evaluation of (107). We can 
achieve site factorisation in (|107| ) if we isolate the macroscopic objects ( 109 ,11C, 111 ) by introducing 
suitable (5-distributions (taking care that all exponents scale linearly with A^, to secure statistical 
relevance). Thus we insert 



l = fl fda{t) 6[a{t)-a{t;{a})]= — " f dada e'''^^''^'^^'''^'^'^^^^^''^^'^^ 
4=0"' '-2vrJ J 
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1 = n / dHt) 5[k{t)-k{t; {h})] = 
t=o 

tin p 

1= n dq{t,t')6[q{t,t')-q{t,t';{cT})] = 
t,i'=0 

1= [] dQ{t,t') 5[Q{t,t')-Q{t,t'-{h])] = 
1= n dKit,t') 6[K{t,t'yK{t,t';{a,h})] 



'N' 




.2vr. 





' N- 


{tm' 




.27r. 






' N' 


{tm 




.27r. 







t,t'=o 



' N' 




.2vr. 





Insertion of these integrals into ( |1071 ), followed by insertion of (112) and usage of the short-hand 

^'[a, d, k, k, q, q, Q, Q, K, K] = i ^[a(t)a(t) + k{t)k{t)] 

t 

+ iJ2i^{t, t')qit, t') + Q{t, t')Q{t, t') + k{t, t')K{t, t')] (113) 
t,t' 

then leads us to 

W]= [ dadadkdkdqdqdQdQdKdk e^naAMkqA.Q^Q.K ,k]+N<^>[aXq,Q,K]+0{...) 



X l{dhdh} ^■■■Yl p^^^Q-^^Yle''^"'-^^^^'^^^-^'^^'^^-''!'''^^^'''^^^^ 
cr{0) cr{t„) it 



in which the term denoted as 0{. . .) covers both the non-dominant orders in ( |108| ) and the ©(log A^) 
relics of the various pre- factors [N/27r] in the above integral representations of the (5-distributions 
(note: tm was assumed fixed). We now see explicitly in ( |114| ) that the summations and integra- 
tions over neuron states and local fields fully factorise over the N sites. A simple transformation 
{ai{t),hi(t),hi(t)} {^^j ai{t) , hi{t) , hi{t)} brings the result into the form 

'{dhdh} XI""" XI p{a{0))]Je''^''^^'^^'^''-^'^'^^^''^^^^~^^^''''^^^^^^ 
cr(o) cr{t„) it 

^ N E[d,k,q,Q,K] 



with 



E[a,kq,Q,k] = ^Y.^og J{dhdh} ^ vro(a(0)) eI^*^*'^W['^W-«»'^»W]-^«»'^»(*H*)> 

i cr(0)---cr(tm) 



xe 



Et{/3o'{t+l)/i{*)-log2cosh[/3fe(t)]}-iEt[aWo'{*)+fc{*)^(*)]-«Et tdQ{t,t')^(':h{t')+Q{t,t')Ht)Ht')+K{t,t')h{t)cT{t')] 

(115) 



in which {dhdh} = l\i-[dh{t)dh{t)/2ir] and 7ro{a) = l[l + mo]da,i + ^[l-mo]6a-i- At this stage (|TT^) 
acquires the form of an integral to be evaluated via the saddle-point (or 'steepest descent') method: 

(116) 



Z[{i;{t)}] = J dadadkdkdqdqdQdQdKdk e^W-]+'S'[-]+^[-]}+0{-) 
in which the functions ^'[. . .], '&[•..] and H[. . .] are defined by (112,113. 115| ). 
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The Saddle-Point Problem. The disorder-averaged generating function (|llq ) is for — > oo dominated 
by the physical saddle-point of the macroscopic surface 



^[a, d, k, k, q, q, Q, Q, K, K] + $[a, k, q, Q, K] + E[a, k, q, Q, K] 



(117) 



with the three contributions defined in ( |112 ,113, 115| ). It will be advantageous at this stage to define 
the following effective measure (which will be further simplified later): 



(/[w,{M,{M]).=-5: 



^ ■ 



[ J{dhdh}E^,,y...,M{a}, {h}, {h}] f[{a}, {h}, {h}] 



j{dhdh}Y: 



M,[{a},{h},{h}] 



(118) 



with 



Mi[{a},{h},{h}] = vro(o-(0)) eEt-t*'^W['^(*)-?'^»{*)l-*€''/'«{*)'^W+/3'^{*+i)'*(*)-i°s 2cosh[/3h{t)]} 

in which the values to be inserted for {ih{t), k{t),q{t, t'),Q{t, t'),K{t, t')} are given by the saddle-point 
of ( 11171 ). Variation of ( |117| ) with respect to all the original macroscopic objects occurring as arguments 
(those without the 'hats') gives the following set of saddle-point equations: 

a{t) = id^/da{t) k{t) = id^/dk{t) (119) 

q{t, t') = id^/dq{t, t') Q{t, t') = id^/dQ{t, t') k{t, t') = id^/dK{t, t') (120) 

Variation of ( |117] ) with respect to the conjugate macroscopic objects (those with the 'hats'), in turn, 

(121) 
(122) 

The coupled equations ( |119| , 12[1| , 121 jl22D are to be solved simultaneously, once we have calculated 
the term ^*[. . .] ( |112 ) which depends on the synapses. This appears to be a formidable task; it can, 
however, be simplified considerably upon first deriving the physical meaning of the above macroscopic 
quantities. We apply (105. 106 ) to ( 116 ), using identities such as 



and usage of our newly introduced short-hand notation (. . .)^, gives: 

a{t) = {cj{t)), k{t) = {h{t)), 

q{t, t') = {a{t)a{t')), Q{t, t') = {h{t)h{t')), K{t, t') = {h{t)a{t')). 



d^l ■ ■] 



N 
i 

'n 



^1 



/{dfedME.(0)...,(,^)M,[{a},{M,{A}]^(t) 

/{dMME.(o)....(i„)M, [W, {M, {/i}] 
/{d/idME.(o)....(t^)M, [W, {M, {/i}]/i(t) 



J{dhdh}E^ 



(0)-a(t„)^>'Ji 



Mj[{a},{h},{h}] 



di;,{t)di;j{t') 

d'm. . .] 

dej{t)ddj{t') 

dipj{t)dej{t') 



1 

'n 
1 

'n 

i 

'n 



/{d/idME.(o)....{t„)M,[{a},{M,{M] 

/{c?/irfME.(o)....(,^)M,[{a},{/^},{/.}]A(t)A(f) 
/{(i/id/i}E.(o)....(t„)M, [{a}, {h}, {h}] 

/{dfedME.(o)....(t^)M,[M,{/i},{/i}]a(t)A(tO 
/{(i/id/i}E.(o)....(t„)M, [{a}, {M, {h}] 



n 



n 



n 



' dE[...] 
dE[...] 



dOjit) 



' dE[...] 

de,{t') 

dE[..] 

dej{t') 



5 DYNAMICS IN THE COMPLEX REGIME 



38 



and using the short-hand notation ( |118| ) wherever possible. Note that the external fields {ilji{t), Oi{t)} 



occur only in the function H[. . .], not in ^'[. . .] or $[...], and that overall constants in Z[il)] can always 

gAf[*+$+H]+0(...) 



be recovered a posteriori, using Z[0] = 1: 



m 



fda . . . dK 

it) = lim - Yil- "--^^^^^ — = lim (ait)). 



iG{t,t') 



I Jda . . . dK 

lim — 

I Jda . . . dK 

- lim — 



Nd^E I NdE NdS 



gAr[<I'+<I>+H]+0(...) 



Jda...dK eM*+*+=]+c^(-) 



NdE. Nd= 



^N[^+'l>+E]+Ci{...) 



Jda... dK 6^1*+* 



lim (a(t)a(t')). 



}im{a{t)h{t% 



Finally we obtain useful identities from the seeming ly trivial statements N~''J2iS.}dZ[0]/dei{t) = 

giv[*+$+E:]+o(...) 



and N-^ J2i d'^Z[0]/d9i{t)dei{t') = 0: 

_^ Jda... dk 



° 4^0 A^^^' /(ia...(iK e^t^+^+^l+C^O-) 



lim 



1 fda . . . dk 

= _ lim - y 



Nd'^E I NdE NdE 



gAf[1'+$+H]+0(...) 



}im{h{t)h{t% 



Jda...dK e^[*+*+=]+c^(-) 

In combination with (121, 122| ), the above five identities simplify our problem considerably. The dummy 
fields ipi{t) have served their purpose and will now be put to zero, as a result we can now identify our 
macroscopic observables at the relevant saddle-point as: 

a{t) = m{t) k{t) = q{t,t') = C{t,t') Q{t,t') = K{t,t') = iG{t' ,t) (123) 

Finally we make a convenient choice for the external fields, 9i{t) = ^j9{t), with which the effective 
measure (. . .)^ of (124) simplifies to 



{f[{a},{h},{h}]). 



J{dhdh}j:.^oy..^it^)M[{a}, {h}, {h}] f[{a}, {h}, {h}] 
J{dhdh}Z.^,y.^^,^)M[{a}, {h}, {h}] 



(124) 



with 



M[{a}, {h}, {h}] = vro(o-(0)) eEt{»'^(*)[^(*)-^(*)l+/3'^(*+i)^(*)-i°s 2cosh[/3h(t)]}-iX;ja(t)a(t)+fc(t)h{t)] 

-*Et t'['?(*'*')'^W'^(*') + Q(*'*')^(*)/l{t')+^{t:<')'i(*)<^{t')] 



xe 



In summary: our saddle-point equations are given by ( p.l9| , |12C ,121 122| ), and the physical meaning of 
the macroscopic quantities is given by ( |123| ) (apparently many of them must be zero). Our final task 
is finding (|112|), i.e. calculating the leading order of 



N 



log 



(125) 



which is where the properties of the synapses ( [LOOD come in 
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5.3 Parallel Dynamics Hopfield Model Near Saturation 



The Disorder Average. The fully connected Hopfield [|| network (here with parallel dynamics) is 
obtained upon choosing c = in the recipe ( |10[)| ), i.e. Cij = 1—Sij and p = aN. The disorder average 
thus involves only the patterns with /_i > 1. In view of our objective to write (|125| ) in the form ( |112| ), 
we will substitute the observables defined in (10£.110,111) whenever possible. Now ( |125 ) gives 



N 



log 



= ia ^ K{t, t; {cr, h}) — « ^ a{t)k{t) + a log 
t t 

We concentrate on the last term: 



+ 0{N-^) (126) 



dxdy e-'-^ ?/ J]h[x(t)-^4=^] ^ivit)' 



N 



N 



dxdy dxdy i[x x+y y x y] 

(27r)2(t-+i) 



dxdydxdy i[x-x+y-y-x-y]+J2^ogcos -^Y.^[x{t)a,{t)+y(t)fH{t)] 
(2vr)2{tm+i) ^ 

' dxdydxdy i[a;.a;+y.y_a;.y]-__i_ ^J^J£(tK(t)+j5(t)ft,(t)]}2+o(^-i) 
(2vr)2{tm+i) 

' dxdydxdy i[x x+y y x yy^J2t A^(t)Ht')Q{t,t')+2^t)y{t')K{t',t)+my{t')Qit,t')]+0{N-'') 
(27r)2{*-+i) 

Together with ( |126|) we have now shown that the disorder average ( p. 251 ) is indeed, in leading order in 
A^, of the form (|112D (as claimed), with 

$[a, k, q, Q, K]=iaY^ K{t, t) - ia ■ k + alog J ^^^^^^^ff^f^ ^^[x■x+y.y-x■y]-l[x■qx+2y■Kx+y■Qy] 



la 



ir(t, t) — ia • A; + a log J 



(2vr)2 
dudv 



(2^) 



e 2 



(127) 



(which, of course, can be simplified further). 



Simplification of the Saddle-Point Equations. We are now in a position to work out equations ( 119 , 12[)| ). 
For the single-time observables this gives d{t) = k(t) and k(t) = a{t), and for the two-time ones: 



Kt,t') 



Qit,t') 



k{t,t') 



-m 



1 . Jdudv u{t)u{t')e-"2iu-'l'^+^'"-^u-2iu-v+v-Qv] 

2 Jdudv Q~\[u(iu+2v Ku-2iu v+v Qv] 

1 . jdudv ^(i)^(f)e-|[«-qw+2^-Fs:w-2itx-t^+'!;-Q^^] 
— at 

2 jdudv Q-\[u(lu+2v Ku 2iu v+v Qv] 

jdudv ^(t)^(f)e-^[^-g"+2^-^^-2^^-^+-"-Q^] 
jdudv e.-\[uqu+2v Ku-2iu v+v Qv\ 



a 5, 



t,t' 



At the physical saddle-point we can use (123) to express all non-zero objects in terms of the observables 
m(t), C{t,t') and G{t,t'), with a clear physical meaning. Thus we find d{t) = 0, k(t) = m{t), and 
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g(t t') = — ai \ , ^ , , = (128 

^(^'^ ) = -2«^ e-|t--^-^-^'^-I^-^>] = "2"^ [(I-G)-C(I-Gt)-] (MO (129) 

Kit,t') + aJ,, = -a^ j^J;,^ ^iiu.Cu-..u.li-GM = ^^""-"^^'M (130) 

(with G^{t,t') = G{t',t), and using standard manipulations of Gaussian integrals). Note that we can 
use the identity (I-G)-^-! = E^>o<^^-I = J2e>oG^ = G{I-G)-^ to compactify (|l3|) to 

k{t, t') = a[G{I-G)-^]{t, t') (131) 

We have now expressed all our objects in terms of the disorder-averaged recall overlap m = {m{t)} and 
the disorder-averaged single-site correlation- and response functions C = {C{t, t')} and G = {G{t, t')}. 
We can next simplify the effective measure ( p.24| ), which plays a crucial role in the remaining saddle- 
point equations. Inserting a{t) = q{t,t') = and k{t) = ra{t) into ( |124| ), first of all, gives us 

M[M,{/i},{M] =7ro(a(0)) x 

J2t{Mt)[h{t)~rn{t)-9(t)-Y,^,K{t,t')a{t')]+Pa{t+l)h(^^^^ ^^^2) 

Secondly, causality ensures that G{t,t') = for t < t' , from which, in combination with ( p.3lD , it 
follows that the same must be true for the kernel K{t,t'), since 

K{t,t') = a[G{I-G)-^]{t,t') = a{G + G^ + G^ + ...}{t,t') 

This, in turn, guarantees that the function M[. . .] in (|132| ) is already normalised: 

[{dhdh} M[{a},{h},{h}] = l 

One can prove this iteratively. After summation over a{tm) (which due to causality cannot occur in 
the term with the kernel K{t,t')) one is left with just a single occurrence of the field h{tm) in the 
exponent, integration over which reduces to 6[h{tm)], which then eliminates the conjugate field h{tm)- 
This cycle of operations is next applied to the variables at time tm — ^, etc. The effective measure 
( |124| ) can now be written simply as 

if [{a}, {h}, {h}]U = E fidhdh} M[{a}, {h}, {h}] f[{a}, {h}, {h}] 

with M[. . .] as given in (|132| ). The remaining saddle-point equations to be solved, which can be slightly 
simplified by using the identity (cr(t)/i(t'))^ = id{a(t))i,/d6(t'), are 

m{t) = {a{t)), C{t,t') = {a{t)a{t'))^ G{t,t') = d{a{t))J de{t') (133) 



Extracting the Physics from the Saddle-Point Equations. At this stage we observe in ( |133| ) that we 
only need to insert functions of spin states into the effective measure (••.)★ (rather than fields or 
conjugate fields), so the effective measure can again be simplified. Upon inserting (|12S|J131 ) into the 
function ( p2D we obtain (/[{ct}])^ = X;<x{o)-a(t™) Pi'ob[{CT}] f[{a]\, with 



Prob[{a}] =vro(a(0)) {d(t>} P[{0}] \{ 



-[l+a(t+l)tanh[/3Mi|{c^},W)] 



(134) 
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in which 7ro(o-(0)) = ^[l+(T(0)mo], and 

h{t\W}, W}) = Mt) + + (^Y.\^G{l-G)-\t,t')a{t') + a^4>{t) (135) 



t'<t 

{t,t')4>{t') 



P\W] = — n T (136) 

(27r)(t™+i)/2det-2 (I-Gt)C-i(I-G) 



(note: to predict neuron states up until time tm we only need the fields up until time tm — !)• We 
recognise ( |134| ) as describing an effective single neuron, with the usual dynamics Prob[o"(t+l) = ±1] = 
i[l lb tanh[/3/i(t)]], but with the fields ( [1351 ). This result is indeed of the form (|98|), with a retarded 
self-interaction kernel R{t,t') and covariance matrix {(j){t)(p{t')) of the Gaussian (p(t) given by 

R{t,t') = [G{I-G)-']{t,t') mW)) = [{1-G)-^C{1-G^r\t,t') (137) 

For a we loose all the complicated terms in the local fields, and recover the type of simple 
expression we found earlier for finite p: m(t+l) = tanh[/3(m(t)+0(t))]. 

It can be shown p5|| (space limitations prevent a demonstration in this paper) that the equilibrium 
solutions obtained via replica theory in replica-symmetric ansatz [^] can be recovered as those time- 
translation invariant solutions^ of the above dynamic equations which (i) obey the parallel dynamics 
fluctuation-dissipation theorem, and (ii) obey limr^oo G{t) = 0. It can also be shown that the AT 
[ p7| instability, where replica symmetry ceases to hold, corresponds to a dynamical instability in the 
present formalism, where so-called anomalous response sets in: limT-^oo G{t) ^ 0. 

Before we calculate the solution explicitly for the first few time-steps, we first work out the relevant 
averages using ( pl ). Note that always C{t,t) = (^^(t))^ = 1 and G{t,t') = R{t,t') = for t < t' . As 
a result the covariance matrix of the Gaussian fields can be written as 

mW)) = [{I-G)-'C{I-G^)-\t,t') = J2 [6t,s+R{t,s)]C{s,s')[6,,,t'+R{t',s')] 

s,s'>0 

t t' 

= J2J2i^t^s+R{t,s)]Cis,s')[Ss',t'+Rit',s')] (138) 

s=Os'=0 

Considering arbitrary positive integer powers of the response function immediately shows that 

{G^){t,t') = if t'>t-i (139) 

which, in turn, gives 

R{t,t')=Y.iG%,t') = J2iG%,t') (140) 

Similarly we obtain from (I— G)^^ = I+-R that for t' > t: {1—G)~^{t,t') = 5t^t' ■ To suppress notation 
we will simply put h{t\..) instead of h{t\{a} ^ this need not cause any ambiguity. We notice 

that summation over neuron variables (j{s) and integration over Gaussian variables (j){s) with time 
arguments s higher than than those occurring in the function to be averaged can always be carried 
out immediately, giving (for t > and t' <t): 

r 1 
m{t)= J2 vro(f7(0)) /{#}P[{0}] tanh[/3/i(t-l|..)] ]J - [l+a(s+l)tanh[/3/i(s|..)]] (141) 

a{0)...a{t-l) s=0 ^ 



i{t) = m, C{t,t') = C{t-t') and G{t,t') = G{t-t') 
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G{t,t') = (3lc{t,t' + l) - M(^i.O)) J{d(p}P[W}] tanh[/3/i(t-l|..)]tanh[/?/i(t'|..)] 



a(0)...a(t-l) 



t-2 



Y[-[l+a{s + l)tanh[ph{s\..)]]\ (142) 



X . . 

(which we obtain directly for t' = t — 1, and which fohows for times t' < t — 1 upon using the identity 
f7[l — tanh^(x)] = [l + cjtanh(x)][cj — tanh(x)]). For the correlations we distinguish between t' = t — 1 
and t' < t-1: 

C{t,t-1)= J2 ^o(f^(0)) /"{#}P[{(/)}] tanh[/5/i(t-l|..)]tanh[/3/i(t-2|..)] n 2 [^+^(^ + ^)*^"^['^^(^l•• 
(143) 

whereas for i' < t — 1 we have 

t-2 . 

C{t,t')= vro(a(0)) /{#}P[{(A}] tanh[/3/i(t-l|.0]a(t')n 2 [1+^(^+1) tanh[/3/i(s^ (144) 

a(0)...cr(t-l) s=0 

Let us finally work out explicitly the final macroscopic laws ( 141 ,142, 143| , 144] ), with ( [135 , 136| ), for the 
first few time-steps. For arbitrary times our equations will have to be evaluated numerically; we will 
see below, however, that this can be done in an iterative (i.e. easy) manner. At t = we just have 
the two observables m(0) = m-o and C(0, 0) = 1. 

The First Few Time-Steps. The field at t = is /i(0|..) = mo + 0(O)+a2(/)(O), since the retarded self- 
interaction does not yet come into play. The distribution of 0(0) is fully characterised by its variance, 
which ( |138| ) claims to be 

((/>2(0))=C(0,0) = 1 

\ 12 

Therefore, with Dz = (27r)~2e~2^ dz, we immediately find ( |141| , |142| , |143| jl44 ) reducing to 



I) = j Dz i3Jih[[5{mo+e{{))+zyJa)] C(l, 0) = mom(l) (145) 

G{l,f)) = [3^1- j Dz tanh2[/3(mo+0(O) + zVa)]| (146) 



For the self-interaction kernel this implies, using ( |14[1| ), that i?(l,0) = G(1,0). We now move on to 
t = 2. Here equations (|l4l],|4|,|l4|,|l4|) give us 



"^(2) = \Y. I dmdmP[m,m] tanh[/3/i(l|..)][l+a(0)mo] 
C(2,l) = \Y1 /#(1)#(0)-P[</'(0)><^(1)] tanh[/?/i(l|..)]tanh[/3/i(0|..)][l+fT(0)mo] 



<7{0)'^ 

C(2,0) = ^ f{d^}P[W] tanh[/3/i(l|..)](7(0)^ [l+a(l)tanh[/3/i(0|..)]] [l+a(0)mo] 

G{2,1)=p\i-\Y /#(0)#(l)P[</'(0),(/.(l)] tanh2[/3/i(l|..)][l+a(0)mo] 
I ^ -(0) ^ 

G(2,0) =(3lc{2,l)-^Y /#(0)#(l)-P['/'(0),(/'(l)] tanh[/?/i(l|..)]tanh[/3/i(0|..)][l+(T(0)mo] \ =0 
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We already know that {(j)'^{0)) = 1; the remaining two moments we need in order to determine 
P[0(O),^(1)] fohow again from (p^ ): 

1 

m)m) = Y.i^i,s+So,sR{l,0)]C{s,0) = C(1,0) + G(1,0) 

1 1 

{cl)\l)) =Y,J2i^i,s + So,sRil,0)]C{s,s')[6s',i+5,,^oRihO)] = G2(1,0) +2C(0,1)G(1,0) +1 

s=0 s'=l 

We now know P[(j){0), </>(!)] and can work out all macroscopic objects with t = 2 explicitly, if we wish. 
I will not do this here in full, but only point at the emerging pattern of all calculations at a given time 
t depending only on macroscopic quantities that have been calculated at times t' < t, which allows 
for iterative solution. Let us just work out m{2) explicitly, in order to compare the first two recall 
overlaps m(l) and m(2) with the values found in simulations and in approximate theories. We note that 
calculating m(2) only requires the field for which we found (0^(1)) = ^^(l, 0)+2C(0, 0)+l: 

m(2) = \Y1 fdH'^)P[^i^)] tanh[/3(m(l)+0(l)+QG(l,O)a(O)+Q^(l))][l+a(O)mo] 

= ^[1+mo] Jdz tanh[/3(m(l)+6l(l)+aG(l,0)+zya[G2(l,0)+2mom(l)G(l,0) + l])] 
+ ^[1-mo] J Dz tanh[/3(m(l)+0(l)-aG(l,O)+zya[G2(l,O) + 2mom(l)G(l,O) + l])] 



Exact Results Versus Simulations and Gaussian Approximations. I close this section on the fully 
connected networks with a comparison of some of the approximate theories, the (exact) generating 
functional formalism, and numerical simulations, for the case 9{t) = (no external stimuli at any 
time). The evolution of the recall overlap in the first two time-steps has been described as follows: 



Naive Gaussian Approximation : 



Amari — Maginu Theory : 



Exact Solution : 



m(l) 
m(2) 

m(l) 
m(2) 
S2 = 
G = 

m(l) 
m(2) 

S2 = 
G = 



jDz tanh[/3(m(0) + z^)] 
jDz tanh[/?(m(l) + z^)] 

JDz tanh[/?(m(0) + z^)] 
JDz tanh[/3(m(l) + zS^)] 



- 2m(0)m(l)G + G2 

1 - JDz tanh2[/3(m(0) + z^)] 



JDz tanh[/?(m(0) + z^/a)] 

^[l+mo]jDz tanh[/3(m(l) + aG + zS^)] 

+ ^ [1 -mo] / Dz tanh[/3(m(l) - aG + zS^)] 
l + 2m(0)m(l)G + G2 
pll- JDz tanh2[/3(m(0) + z^)] 



We can now appreciate why the more advanced Gaussian approximation (Amari-Maginu theory, ||T^) 
works well when the system state is close to the target attractor. This theory gets the moments of 
the Gaussian part of the interference noise distribution at t = 1 exactly right, but not the discrete 
part, whereas close to the attractor both the response function G(1,0) and one of the two pre-factors 
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m 




Figure 8: The first few time-steps in the evolution of the overlap m{a) = N^'^ ^i^i i^i ^ parallel 
dynamics Hopfield model with a = T = 0.1 and random patterns, following initial states correlated with 
pattern one only. Left: simulations (o) versus naive Gaussian approximation (•). Middle: simulations 
(o) versus advanced Gaussian approximation (Amari-Maginu theory, •). Right: simulations (o) versus 
(exact) generating functional theory (•). All simulations were done with A^ = 30,000. 



[1 lb m-o] in the exact expression for m{2) will be very small, and the latter will therefore indeed 



approach a Gaussian shape. One can also see why the non-Gaussian approximation of |17] made 
sense: in the calculation of m(2) the interference noise distribution can indeed be written as the sum 
of two Gaussian ones (although for t > 2 this will cease to be true). Numerical evaluation of these 
expressions result in explicit predictions which can be tested against numerical simulations. This is 
done in figure ^, which confirms the picture sketched above, and hints that the performance of the 
Gaussian approximations is indeed worse for those initial conditions which fail to trigger pattern recall. 



5.4 Extremely Diluted Attractor Networks Near Saturation 

Extremely diluted attractor networks are obtained upon choosing lim^v^oo c/N = (while still c — > cxd) 
in definition ( |100| ) of the Hebbian-type synapses. The disorder average now involves both the patterns 
with n > 1 and the realisation of the 'wiring' variables Cij S {0,1}. Again, in working out the key 
function (125) we will show that for N ^ oo the outcome can be written in terms of the macroscopic 
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quantities (10£ , 110 ,111). We carry out the average over the spatial structure variables {cij} first: 

1 



logll 



At this stage we have to distinguish between symmetric and asymmetric dilution. 

The Disorder Average. First we deal with the case of symmetric dilution: Cij = cji for all i ^ j. The 
average over the Cjj, with the distribution ( |101| ), is trivial: 



'-^'^ Eu ^z^C Ei (*)<^> (*)] 



n 1+ 



iV' 



-1] 



1 



EjrcEt['*'W'^i(*^i(*)<^i(*)i-dF[Ejr«;Ej'^>(*)<^.(*>^.(*)<^'(*)]]'+o 



g JV l^^'ii^j 



i<j 



We separate in the exponent the terms where /x = z/ in the quadratic term (being of the form J2fiu • • Oi 
and the terms with /u = 1. Note: p = ac. We also use the definitions ( |109| , |110|Jlll| ) wherever we can: 

t ^ St 

1 log Jg-^ E,>i EJE.C'^'C^WE, «>i(*)]-4i]v E.^, E^^. E.* C«;?r«;^^ 

Our 'condensed ansatz' implies that for ^ > 1: N~^ Y.iii'^iit) = and A^"5 Y.iiihi{t) = 0(1). 
Thus the first term in the exponent containing the disorder is 0{c), contributing 0{c/N) to T[. . .]. 
We therefore retain only the second term in the exponent. However, the same argument applies 
to the second term. There all contributions can be seen as uncorrelated in leading order, so that 
Ei^j E^^i/ • • • = O(-^p)) giving a non-leading 0{N^^) cumulative contribution to ..]. Thus, 
provided limAr^oo c"""^ = limTv^ooc/A^ = (which we assumed), we have shown that the disorder 
average ( |125| ) is again, in leading order in N ^ of the form ( |112D (as claimed), with 



Symmetric: $[a, fc, q, Q, K] = -ia ■ k - ]-aY^[q{s,t)Q{s,t)+K{s,t)K{t, s)] (147) 



St 



Next we deal with the asymmetric case (102), where Cij and Cji are independent. Again the average 
over the Cij is trivial; here it gives 



jl |g-i'='.- E, «rc; E* h^{t)<r, (t)^-ic,, Y., ers; e* '^.W'^^w | 



]^|i+^[g-^E,€rCE.^>w-.w_i]||i+^[g"^E,CCE.^.(*KW_i]| 



2c2 
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{ fl t tM t ) 

(in which the horizontal bars of the two constituent hnes are to be read as connected) 

i<j 

Again we separate in the exponent the terms where fi = ly in the quadratic term (being of the form 
J2f_iu • • •)) s-iid the terms with /i = 1, and use the definitions (|lOg|JTlOt|lTl|) : 



t ^ St 

The scahng arguments given in the symmetric case, based on our 'condensed ansatz', apply again, and 
tell us that the remaining terms with the disorder are of vanishing order in N. We have again shown 



that the disorder average (125) is, in leading order in N, of the form ( |112| ), with 

1 
2 



Asymmetric: ^[a,k,q,Q, K] = —ia ■ k — -a'^q{s,t)Q{s,t) (148) 

St 



Extracting the Physics from the Saddle-Point Equations. First we combine the above two results 
(|14|,|14|) in the following way (with A = 1 for symmetric dilution and A = for asymmetric dilution): 

<!>[a,k,q,Q,K] = -ia k- ^aY,[Q{s,t)Q{s,t) + AK{s,t)K{t,s)] (149) 

^ St 



We can now work out equations ( 119| , 12[)| ) , and use ( |123| ) to express the result at the physical saddle- 



point in terms of the trio {m{t),C{t,t'),G{t,t')}. For the single-time observables this gives (as with 
the fully connected system) a{t) = k{t) and k{t) = a{t); for the two-time ones we find: 

Q{t, t') = -^iaC{t, t') q{t, t') = k{t, t') = aAG{t, t') 

We now observe that the remainder of the derivation followed for the fully connected network can 
be followed with only two minor adjustments to the terms generated by K{t,t') and by Q{t,t'): 
aG{I-G)~^ aAG in the retarded self-interaction, and (I-G)"^C(I-G"f)~^ ^ C in the covariance 
of the Gaussian noise in the effective single neuron problem. This results in the familiar saddle-point 
equations (|133D for an effective single neuron problem, with state probabilities (134) equivalent to the 



dynamics Prob[(T(t+l) = ±1] = ^[1 it tanh[/3/i(t)]], and in which 7ro(fT(0)) = |[l+(T(0)mo] and 

1 -iEtt"^(*)C~V)<^(i') 

h{t\{a},{cl>})=m{t)+0{t)+aAj2G{t,t')a{t')+a-^cPit) P[{,/>}] = ' (150) 

p^t (27r)(*-+i)/2det2C 

Physics of Networks with Asymmetric Dilution. Asymmetric dilution corresponds to A = 0, i.e. there 
is no retarded self-interaction, and the response function no longer plays a role. In ( |150| ) we now only 
retain h{t\ • . .) = m{t)+6{t)-\-a2 <p(t), with {(jP'it)) = C(l, 1) = 1. We now find (|141j ) simply giving 



t-i , 

m{t + l)= J2 ^o(fT(0)) /{#}P[{0}] tanh[/3/i(t|...)][]-[l+a(s+l)tanh[/3/i(s|...)]] 



a{0)...a{t) 
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Figure 9: Phase diagrams of extremely diluted attractor networks. Left: asymmetric dilution, 
Cij and Cji are statistically independent. Solid line: continuous transition, separating a non-recall 
(paramagnetic) region (P) from a recall region (R). The line reaches r = 0atac = 2/7r~ 0.637. 
Right: symmetric dilution, Cij = Cji for all Solid lines: continuous transitions, separating a non- 
recall region (P) from a recall region (R), for a < 1, and from a spin-glass region (SG), for a > 1. 
Dashed-dotted line: the AT instability. The R— >SG line (calculated within RS) reaches T = at 
a^^ = 2/7r ~ 0.637. In RSB the latter is replaced by a new (dashed) line, giving a new storage 
capacity of af^^ = 1. 



Jdz tanh[|3{m{t)+e{t) + z^/a)] 



(151) 



Apparently this is the one case where the simple Gaussian dynamical law (95) is exact at all times. 
Similarly, for t > t' equations ( |142 , 143| ,144) for correlation and response functions reduce to 



C{t,t') 



1 <t>l+<t>l-2C(t-l,t' -l)4,a 
' 2 l-C2(t-l,t'-l) 



27rVl-C^(t-l,i'-l) 



tanh[/3(m(t-l)+6'(t-l)+(/)a^/^)]tanh[/3(m(t'-l)+(9(t'-l)+(/.fcVa)] 

(152) 
(153) 



G{t,t') = I35t,t'+i \l- joz tanh2[/3(m(t-l)+0(t-l)+zVa)]| 

Let us also inspect the stationary state ■m{t) = m, for 9{t) = 0. One easily proves that m = as soon 
as T > 1, using m? = [3m dk[\ — ^ Dzi&Y\}c?[(5{k + Zy/a)\\ < f3m? . A continuous bifurcation occurs 
from the m = state to an m > state when T = 1 — jDz tanh^ [(3 z^/a\. A parametrisation of this 
transition line in the (a, T)-plane is given by 

T{x) = I- J Dz tanli^izx), a{x) = x^T'^{x), x > 

For a = we just jet m = tanh(/?m) so Tc = 1. For T = we obtain the equation m = erf[m/\/2a], 
giving a continuous transition to m > solutions at Oc = 2/7r ~ 0.637. The remaining question 
concerns the nature of the m = state. Inserting m(t) = 9{t) = (for all t) into (|152| ) tells us that 
C{t,t') = f[C{t-l,t'-l)] for t> t' > 0, with 'initial conditions' C{t,0) = m{t)mo, where 



f[C] 



1 <#'a + '/'j~2C0a0i, 
"2 



tanh[/3 "v/a^a] tan]l[|3^/a(f)b 
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In the m = regime we have C(t,0) = for any t > 0, inducing C{t,t') = for any t > t' , due to 
/[O] = 0. Thus we conclude that C{t,t') = 5t^t' in the m = phase, i.e. this phase is para-magnetic 
rather than of a spin-glass type. The resulting phase diagram is given in figure |9|, together with that 
of symmetric dilution (for comparison). 

Physics of Networks with Symmetric Dilution. This is the more complicated situation. In spite of the 
extreme dilution, the interaction symmetry makes sure that the spins still have a sufficient number of 
common ancestors for complicated correlations to build up in finite time. We have 

hit\{a},m) = mit)+eit)+aY,G{t,t')a{t')+a-^c^{t) P[{0}] = . ^ (154) 

f^t (27r)(*-+i)/2det2C 



The effective single neuron problem (134, 154| ) is found to be exactly of the form found also for the 



Gaussian model in [y] (which, in turn, maps onto the parallel dynamics SK model |^^) with the 
synapses Jij = Jo^i^j/N + Jzij/y/N (in which the Zij are symmetric zero-average and unit-variance 
Gaussian variables, and Ja = for all i), with the identification: 

J \fa Jo — > 1 

(this becomes clear upon applying the generating functional analysis to the Gaussian model, page 
limitations prevent me from explicit demonstration here). Since one can show that for Jq > the 
parallel dynamics SK model gives the same equilibrium state as the sequential one, we can now 
immediately write down the stationary solution of our dynamic equations which corresponds to the 
FDT regime, with q = limT-_+oo limt--+oo C{t,t + T): 

q = J tanh.'^ [P{m+z^/aq)] m = j Dz iaT^\^[|3{m + z^/^^q)\ (155) 

These are neither identical to the equations for the fully connected Hopfield model, nor to those of 
the asymmetrically diluted model. Using the equivalence with the (sequential and parallel) SK model 
[ p8[ we can immediately translate the phase transition lines as well, giving: 

SK model Symmetrically Diluted Model 

P^ F : T = Jo for Jo > J T = 1 for a < 1 

P^SG: T = J for Jo < J T = ^ foi a > 1 

F^SG(inRS): T = Jo(l-g) for T < Jq r = l-gforT<l 

SG (in RSB) : Jo = J for T < J a = 1 for T < ^ 

^T-line: T'^ = J'^ jDzcosh-^p[Jom+Jz^] T'^ = a jDzcosh-'^P[m+z^] 

where q = jDz tanli^ P[m + z^^aq]. Note that for T = we have g = 1, so that the equation 
for m reduces to the one found for asymmetric dilution: m = erf [m/\/2a]. However, the phase 
diagram shows that the line F SG is entirely in the RSB region and describes physically unrealistic 
re-entrance (as in the SK model), so that the true transition must be calculated using Parisi's replica- 
symmetry breaking (RSB) formalism (see e.g. [p9|| ), giving here ac = 1. 

The extremely diluted models analysed here were first studied in (asymmetric dilution) and 
[ p3| (symmetric dilution). We note that it is not extreme dilution which is responsible for a drastic sim- 
plification in the macroscopic dynamics in the complex regime (i.e. close to saturation), but rather the 
absence of synaptic symmetry. Any finite degree of synaptic symmetry, whether in a fully connected or 
in an extremely diluted attractor network, immediately generates an effective retarded self-interaction 
in the dynamics, which is ultimately responsible for highly non-trivial 'glassy' dynamics. 
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6 Epilogue 

In this paper I have tried to explain how the techniques from non-equiUbrium statistical mechanics 
can be used to solve the dynamics of recurrent neural networks. As in the companion paper on statics 
in this volume, I have restricted myself to relatively simple models, where one can most clearly see 
the potential and restrictions of these techniques, without being distracted by details. I have dealt 
with binary neurons and graded response neurons, and with fully connected and extremely diluted 
networks, with symmetric but also with non-symmetric synapses. Similar calculations could have been 
done for neuron models which are not based on firing rates, such as coupled oscillators or integrate- 
and-fire type ones, see e.g. [^]. My hope is that bringing together methods and results that have 
so far been mostly scattered over research papers, and by presenting these in a uniform language to 
simplify comparison, I will have made the area somewhat more accessible to the interested outsider. 

At another level I hope to have compensated somewhat for the incorrect view that has sometimes 
surfaced in the past that statistical mechanics applies only to recurrent networks with symmetric 
synapses, and is therefore not likely to have a lasting impact on neuro-biological modeling. This 
was indeed true for equilibrium statistical mechanics, but it is not true for non-equilibrium statistical 
mechanics. This does not mean that there are no practical restrictions in the latter; the golden rule 
of there not being any free lunches is obviously also valid here. Whenever we wish to incorporate 
more biological details in our models, we will have to reduce our ambition to obtain exact solutions, 
work much harder, and turn to our computer at an earlier stage. However, the practical restrictions in 
dynamics are of a quantitative nature (equations tend to become more lengthy and messy) , rather than 
of a qualitative one (in statics the issue of detailed balance decides whether or not we can at all start 
a calculation). The main stumbling block that remains is the issue of spatial structure. Short-range 
models are extremely difficult to handle, and this is likely to remain so for a long time. In statistical 
mechanics the state of the art in short-range models is to be able to identify phase transitions, and 
calculate critical exponents, but this is generally not the type of information one is interested in when 
studying the operation of recurrent neural networks. 

Yet, since dynamical techniques are still far less hampered by the need to impose biologically 
dubious (or even unacceptable) model constraints than equilibrium techniques, and since there are 
now well-established and efficient methods and techniques to obtain model solutions in the form of 
macroscopic laws for large systems (some are exact, some are useful approximations), the future in 
the statistical mechanical analysis of biologically more realistic recurrent neural networks is clearly in 
the non-equilibrium half of the statistical mechanics playing field. 
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